巴西专利BR112019012110A2 systems and methods of applying pragmatic principles for interaction with visual analytics

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
It is a method that uses natural language processing for visual analysis of a data set by a computer. The computer displays a data view based on a data set retrieved from a database using a first set of one or more database queries. The computer receives user input (for example, keyboard or voice) to specify a natural language command related to the displayed data view. Based on the displayed data visualization, the computer extracts one or more independent analytic phrases from the first natural language command. The computer additionally calculates a set of one or more conversation centers associated with natural language command based on the set of one or more analytic phrases. The computer then calculates a set of analytic functions associated with the set of one or more conversation centers, thereby creating a set of one or more functional phrases. The device then updates the data view based on the set of one or more functional phrases.
公开号:BR112019012110A2
申请号:R112019012110-2
申请日:2018-05-03
公开日:2019-10-29
发明作者:Xuan Chang Angel；Hoque Prince Enamul；J Dykeman Isaac；K Tory Melanie；C Gossweiler Richard Iii；E Battersby Sarah；R Setlur Vidya
申请人:Tableau Software Inc；
IPC主号:

专利说明:

"SYSTEMS AND METHODS OF APPLICATION OF PRAGMATIC PRINCIPLES FOR INTERACTION WITH VISUAL ANALYTICS"
TECHNICAL FIELD [001] The implementations revealed relate, in general, to data visualization, more specifically, to systems, methods and interfaces that allow users to interact with and explore data sets using a natural language interface.
BACKGROUND OF THE INVENTION [002] Data visualization applications allow a user to understand a set of data visually, including distribution, trends, discrepancies and other factors that are important in making business decisions. Some data sets are very large or complex, and include many data fields. You can use a variety of tools to help understand and analyze your data, including dashboards that can have multiple data views. However, it can be difficult to use or find certain features within a complex user interface. Most systems only return very basic interactive views in response to queries, and others require advanced modeling to create effective queries. Other systems require simple closed-ended questions, and only then are they able to return a single text response or a static view.
SUMMARY [003] Consequently, there is a need for tools that allow users to effectively use the functionality offered by data visualization applications. One solution to the problem is to offer a natural language interface as part of a data visualization application (for example, within the user interface for the application
Petition 870190054547, of 06/13/2019, p. 14/227
2/108 data visualization) for an interactive query dialog that provides graphical responses to queries in natural language. The natural language interface allows users to access complex functionality using common questions or commands. Questions and insights often arise from previous questions and data patterns that a person sees. By modeling the interaction behavior as a conversation, the natural language interface can apply principles of pragmatics to improve interaction with visual analytics. Through various techniques to deduce the grammatical and lexical structure of utterances and their context, the natural language interface supports several pragmatic forms of interaction in natural language with visual analytics. These pragmatic forms include the understanding of incomplete utterances, allusion to entities within utterances and visualization properties, support for long and compound utterances, identification of synonyms and related concepts, and "repair" of responses to previous utterances. Additionally, the natural language interface provides appropriate visualization responses both within an existing visualization and by creating new visualizations when necessary, and resolves ambiguity through targeted textual feedback and ambiguity widgets (tools). In this way, the natural language interface allows users to efficiently explore the displayed data (for example, in a data visualization) within the data visualization application.
[004] According to some implementations, a method is executed on an electronic device with a display medium. For example, the electronic device can be a smart phone, a tablet, a notebook, computer or a desktop computer (desktop). The device displays a data visualization based on a data set retrieved from a database using a first set of one or more database queries. A user
Petition 870190054547, of 06/13/2019, p. 15/227
3/108 specifies a first natural language command related to the displayed data visualization. Based on the displayed data visualization, the device extracts a first set of one or more independent analytical phrases from the first natural language command. The device then calculates a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases. The device then calculates a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases. The device then updates the data view based on the first set of one or more functional phrases.
[005] In some implementations, the device receives a second natural language command related to the updated data visualization. After receiving the second natural language command, the device extracts a second from one or more independent analytical phrases from the second natural language command, and calculates a temporary set of one or more conversation centers associated with the second natural language command with based on the second set of one or more analytical phrases, according to some implementations. The device then derives a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules. The device calculates a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases. The device then updates the data view based on the second set of one or more functional phrases.
[006] In some implementations, each of the chat centers in the
Petition 870190054547, of 06/13/2019, p. 16/227
4/108 the first set of one or more chat centers, the temporary set of one or more chat centers, and the second set of one or more chat centers comprises a value for a variable (for example, a data attribute or a view property). In such implementations, the device uses the transition rules by performing a sequence of operations that comprises: determining whether a first variable is included in the first set of one or more conversation centers; determine whether the first variable is included in the temporary set of one or more conversation centers; determine a respective transition rule of one or more transition rules to be applied based on whether the first variable is included in the first set of one or more conversation centers and / or in the temporary set of one or more conversation centers; and apply the respective transition rule.
[007] In some implementations, the one or more transition rules used by the device comprise a CONTINUE rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and add one or more conversation centers from the temporary set of one or more conversation centers to the second set of one or more conversation centers.
[008] In some of these implementations, the device applies the respective transition rule by carrying out a sequence of operations which comprises: according to a determination that (i) the first variable is included in the temporary set of one or more conversation centers, and (ii) that the first variable is not included in the first set of one or more conversation centers, the device applies the CONTINUE rule to include the first variable in the second set of one or more conversation centers.
[009] In some implementations, the one or more transition rules used by the device comprise a RETER rule to retain each call center.
Petition 870190054547, of 06/13/2019, p. 17/227
5/108 conversation in the first set of one or more conversation centers in the second set of one or more conversation centers without adding any conversation center from the temporary set of one or more conversation centers to the second set of one or more conversation centers .
[010] In some of these implementations, the device applies the respective transition rule by carrying out a sequence of operations which comprises: according to a determination that (i) the first variable is included in the first set of one or more conversation centers, and (ii) that the first variable is not included in the temporary set of one or more conversation centers, apply the RETER rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers.
[011] In some implementations, the one or more transition rules used by the device comprise a SHIFT rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and replace one or more chat centers in the second set of one or more chat centers by chat centers in the temporary set of one or more chat centers.
[012] In some of these implementations, the device applies the respective transition rule by carrying out a sequence of operations which comprises: according to a determination that (i) the first variable is included in the first set of one or more conversation centers, and (ii) that the first variable is included in the temporary set of one or more conversation centers: determining whether a first value of the first variable in the first set of one or more conversation centers is different from a second value of the first variable in temporary set of one or more conversation centers; in
Petition 870190054547, of 06/13/2019, p. 18/227
6/108 According to a determination that the first value is different from the second value, apply the OFFSET rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and replace the value for the first variable in the second set of one or more conversation centers with the second value.
[013] In some of these implementations, the device additionally determines whether a widget corresponding to the first variable has been removed by the user; and, according to a determination that the widget has been removed, apply the OFFSET rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and substitute the value for the first variable in the second set of one or more conversation centers by a new value (for example, a maximum value, or a superset value) that includes the first value.
[014] In some implementations, the device creates a first set of one or more queries based on the first set of one or more functional phrases, and repeats the query to the database using the first set of one or more queries, in this way retrieving a second data set, and then display an updated data view using the second data set. In some implementations, the device creates a second set of one or more queries based on the second set of one or more functional phrases, and repeats the query to the database using the second set of one or more queries, thereby retrieving a third data set, and then display an updated data view using the third data set. In some cases, the database query is repeated locally on the computing device using data stored or cached on the computing device. For example, query repetition is usually performed locally when the natural language command specifies one or
Petition 870190054547, of 06/13/2019, p. 19/227
7/108 more filters. In some implementations, the device additionally creates and displays a new data view using the second data set or the third data set.
[015] In some implementations, the device additionally determines whether the user selected a different data set from the first data set, or whether the user restarted the data visualization, and in this case, restarts each of the first set of data or more chat centers, the temporary set of one or more chat centers, and the second set of one or more chat centers for an empty set that does not include any chat centers.
[016] Typically, an electronic device includes one or more processors, memory, a display medium and one or more programs stored in memory. The programs are configured to run by one or more processors and are configured to perform any of the methods described here. The one or more programs include instructions for displaying a data view based on a first set of data retrieved from a database using a first set of one or more queries. The one or more programs also include instructions for receiving first input from the user to specify a first natural language command related to data visualization. The one or more programs also include instructions for extracting a first set of one or more analytical phrases from the first natural language command. The one or more programs also include instructions for calculating a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases. The one or more programs also include instructions for calculating a first set of analytical functions associated with the first set of one or more conversation centers, creating
Petition 870190054547, of 06/13/2019, p. 20/227
8/108 thus a first set of one or more functional phrases, and updating the data visualization based on the first data set of one or more functional phrases.
[017] In some implementations, the one or more programs include instructions for receiving a second input from the user to specify a second natural language command related to the updated data visualization. The one or more programs also include instructions for extracting a second set of one or more analytical phrases from the second natural language command. The one or more programs also include instructions for calculating a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases. The one or more programs also include instructions for deriving a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules . The one or more programs also include instructions for calculating a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases, and updating the data view. based on the second set of one or more functional phrases.
[018] In some implementations, a non-temporary computer-readable storage medium stores one or more programs configured to run by a computing device containing one or more processors, memory and a display medium. The one or more programs are configured to perform any of the methods described here. The one or more programs include instructions for displaying a data visualization based on a first set of data retrieved from a database using a
Petition 870190054547, of 06/13/2019, p. 21/227
9/108 first set of one or more consultations. The one or more programs also include instructions for receiving first input from the user to specify a first natural language command related to data visualization. The one or more programs also include instructions for extracting a first set of one or more analytical phrases from the first natural language command. The one or more programs also include instructions for calculating a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases. The one or more programs also include instructions for calculating a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases, and updating the data view based on the first data set of one or more functional phrases.
[019] In some implementations, the one or more programs include instructions to receive a second input from the user to specify a second natural language command related to the updated data visualization. The one or more programs also include instructions for extracting a second set of one or more analytical phrases from the second natural language command. The one or more programs also include instructions for calculating a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases. The one or more programs also include instructions for deriving a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules . The one or more programs also include instructions for calculating a second set of one or more analytical functions associated with the second set of one
Petition 870190054547, of 06/13/2019, p. 22/227
10/108 or more conversation centers, thereby creating a second set of one or more functional phrases, and updating the data view based on the second set of one or more functional phrases.
[020] In another aspect, according to some implementations, the device displays a data visualization based on a data set retrieved from a database using a first set of one or more database queries. A user specifies a first natural language command related to the displayed data view. Based on the displayed data visualization, the device extracts a first set of one or more independent analytical phrases from the first natural language command. The device then calculates a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases. The device then calculates a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases. The device then updates the data view based on the first set of one or more functional phrases. The user specifies a second natural language command related to the updated data view. After receiving the second natural language command, the device extracts a second from one or more independent analytical phrases from the second natural language command, and calculates a temporary set of one or more conversation centers associated with the second natural language command with based on the second set of one or more analytical phrases. The device then calculates the cohesion between the first set of one or more analytical phrases and the second set of one or more analytical phrases. The device then derives a second set of one or more conversation centers from the first set of one or more conversation centers and
Petition 870190054547, of 06/13/2019, p. 23/227
11/108 of the temporary set of one or more conversation centers based on cohesion. The device calculates a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases. The device then updates the data view based on the second set of one or more functional phrases.
[021] In some implementations, the device calculates cohesion and derives the second set from one or more conversation centers by performing a sequence of operations that comprises: identifying a sentence structure from the second set of one or more analytical phrases; identify one or more forms of pragmatics based on the sentence structure; and deriving a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers based on one or more forms of pragmatics.
[022] In some of these implementations, the device identifies the sentence structure by performing a sequence of operations that comprises: analyzing the second natural language command applying a probabilistic grammar, thus obtaining an analyzed output; and solve the analyzed output in corresponding data and categorical attributes. In some of such implementations, analyzing the second natural language command additionally involves deducting the syntactic structure using a morphosyntactic analysis API (“partof-speechj (eg, a POS tagger”) provided by a tool library (toolkit) for natural language processing.
[023] In some implementations, the device identifies one or more forms of pragmatics by performing a sequence of operations that comprises determining whether the second natural language command is an enunciation
Petition 870190054547, of 06/13/2019, p. 24/227
12/108 incomplete (sometimes called Ellipse) by determining whether one or more linguistic elements are missing from the sentence structure. In some of these implementations, the device derives the second set of one or more conversation centers by performing a sequence of operations that comprises: according to the determination that the second command of natural language is an incomplete enunciation: determining a first subset of centers conversation in the first set of one or more conversation centers, the first subset of conversation centers corresponding to one or more linguistic elements missing from the sentence structure; and calculating the second set of one or more chat centers by combining the temporary set of one or more chat centers with the first subset of chat centers.
[024] In some implementations, the device identifies one or more forms of pragmatics by performing a sequence of operations that comprises determining whether the second natural language command is a reference expression by determining whether one or more anaphoric references are present in the structure. of the sentence; and the device derives the second set of one or more conversation centers by carrying out another sequence of operations which comprises: according to the determination that the second natural command is a reference expression: searching the first set of one or more conversation centers to find a first subset of conversation centers that corresponds to a phrasal block in the second natural language command that contains a first anaphoric reference from one or more anaphoric references; and calculating the second set of one or more chat centers based on the temporary set of one or more chat centers and the first subset of chat centers.
[025] In some of such implementations, the device additionally determines whether the first anaphoric reference is accompanied by a verb in the
Petition 870190054547, of 06/13/2019, p. 25/227
13/108 second natural language command, and in this case, search the first set of one or more conversation centers to find a first action conversation center that refers to an action verb (for example, “filter”) ; and calculates the second set of one or more chat centers based on the temporary set of one or more chat centers, the first subset of chat centers, and the first action chat center.
[026] In some of these implementations, the device determines whether the first anaphoric reference is a deitic reference that refers to some object in the environment, typically by indication, and in this case, calculates the second set of one or more conversation centers based on in the temporary set of one or more conversation centers, and a feature of the object. Deictic references are typically enabled through multimodal interaction (for example, using a mouse in addition to speech or text).
[027] In some of these implementations, the device additionally determines whether the first anaphoric reference is a reference to a display property in the updated data visualization, and in this case, calculates the second set of one or more conversation centers based on the set temporary location of one or more chat centers, and data related to the viewing property.
[028] In some implementations, the device identifies one or more forms of pragmatics by performing a sequence of operations that comprises determining whether the second natural language command is a repair enunciation determining whether the sentence structure corresponds to one or more enunciations of predefined repair (say, to repair a possible ambiguity in the first natural language command or the way the results are presented to the user). For example, the user says “get rid of
Petition 870190054547, of 06/13/2019, p. 26/227
10/148 condominium ”or“ moving from condominium to home ”. In such implementations, if the device determines that the second natural language command is a repair statement, the device calculates the second set of one or more conversation centers based on the temporary set of one or more conversation centers; and updates one or more data attributes in the second set of one or more conversation centers based on one or more predefined repair statements and phrase structure.
[029] In some of such implementations, the device determines whether the sentence structure corresponds to a repair statement to alter a standard behavior related to the display of a data visualization (for example, highlighting for selection, such as in response to “ no filter, instead ”), in which case the device changes the default display-related behavior.
[030] In some implementations, the device identifies one or more forms of pragmatics by carrying out a sequence of operations that comprises determining whether the second natural language command is a conjunctive expression (i) determining the implicit or explicit presence of conjunctions in the structure of sentence, and (ii) determining whether the temporary set of one or more conversation centers includes each conversation center in the first set of one or more conversation centers. In such implementations, the device derives the second set of one or more conversation centers by performing another set of operations that comprises: according to the determination that the second natural language command is a conjunctive expression, calculate the second set of one or more more chat centers based on the temporary set of one or more chat centers.
[031] In some of such implementations, the device determines whether the second natural language command has more than one set; and according
Petition 870190054547, of 06/13/2019, p. 27/227
15/108 with the determination that the second natural language command has more than one set, the device calculates the second set of one or more analytical functions by linearizing the second natural language command. In some of these implementations, the device linearizes the second natural language command by performing a sequence of operations that comprises: generating an analysis tree for the second natural language command; go through the post-order analysis tree to extract a first analytical phrase and a second analytical phrase, where the first analytical phrase and the second analytical phrase are adjacent nodes in the analysis tree; calculate a first analytical function and a second analytical function corresponding to the first analytical phrase and the second analytical phrase, respectively; and combine the first analytical function with the second analytical function by applying one or more logical operators based on one or more characteristics of the first analytical function and the second analytical function, where the one or more characteristics include type of attribute, type of operator and a value.
[032] In some of such implementations, the first analytic function comprises a first attribute (sometimes called a variable here, and includes a visualization property), a first operator, and a first value; the second analytical function comprises a second attribute (sometimes called a variable here, and includes a visualization property), a second operator, and a second value.
[033] In some of such implementations, combining the first analytical function with the second analytical function comprises: determining whether the first attribute is a categorical type attribute or an ordered type attribute, and determining whether the second attribute is an attribute type categorical or an ordered type attribute; determine whether the first attribute and the second attribute are identical; and, according to a determination that the first attribute and the second attribute
Petition 870190054547, of 06/13/2019, p. 28/227
16/108 are identical and both are categorical attributes, apply a join operator to combine the first analytic function and the second analytic function.
[034] In some of such implementations, combining the first analytical function with the second analytical function comprises: determining whether the first attribute is a categorical type attribute or an ordered type attribute, and determining whether the second attribute is an attribute type categorical or an ordered type attribute; determine whether the first attribute and the second attribute are identical; and, according to a determination that the first attribute and the second attribute are non-identical, apply the intersection operator to combine the first analytic function and the second analytic function.
[035] In some of such implementations, combining the first analytical function with the second analytical function comprises: determining whether the first attribute is a categorical type attribute or an ordered type attribute, and determining whether the second attribute is an attribute type categorical or an ordered type attribute; determine whether the first attribute and the second attribute are identical; and, according to a determination that the first attribute and the second attribute are identical, and both are attributes of the ordered type: determining the types of operators of the first operator and the second operator; and, according to a determination that both the first attribute and the second attribute are equality operators, apply the union operator to combine the first analytic function and the second analytic function.
[036] In some of such implementations, combining the first analytical function with the second analytical function comprises: determining whether the first attribute is a categorical type attribute or an ordered type attribute, and determining whether the second attribute is an attribute type categorical or an ordered type attribute; determine whether the first attribute and the second attribute are identical; and, according to a determination that the first attribute and the second attribute
Petition 870190054547, of 06/13/2019, p. 29/227
17/108 are identical, and both are attributes of the ordered type: determining the types of operators of the first operator and the second operator; and, according to a determination that the first operator is a “less than” operator and the second operator is a “greater than” operator: determining whether the first value is less than the second value; and, according to a determination that the first value is less than the second value, apply the join operator to combine the first analytical function and the second analytical function.
[037] In some of such implementations, combining the first analytical function with the second analytical function comprises: determining whether the first attribute is a categorical type attribute or an ordered type attribute, and determining whether the second attribute is an attribute type categorical or an ordered type attribute; determine whether the first attribute and the second attribute are identical; and, according to a determination that the first attribute and the second attribute are identical, and both are attributes of the ordered type: determining the types of operators of the first operator and the second operator; and, according to a determination that the first operator is a “greater than” operator and the second operator is a “less than” operator: determining whether the first value is less than the second value; and, according to a determination that the first value is less than the second value, apply the intersection operator to combine the first analytical function and the second analytical function.
[038] In some implementations, the device additionally calculates the semantic kinship between the second set of one or more extracted analytical phrases and one or more data attributes included in the updated data view, and calculates analytical functions associated with the second set of one or more more analytical phrases, thereby creating the second set of one or more functional phrases, based on the semantic kinship of one or more data attributes. Unlike grammatical cohesion or cohesion between contexts, cohesion
Petition 870190054547, of 06/13/2019, p. 30/227
18/108 lexical seeks cohesion within the context.
[039] In some of these implementations, the device calculates the semantic kinship by performing a sequence of operations that comprises: training a first model of neural network in a large text corpus, thereby learning vector representations of words (word embeddings) ·, calculate a first word vector for a first word in a first sentence in the second set of one or more analytical sentences using a second neural network model, the first word vector mapping mapping the first word to vector representations of words; calculating a second word vector for a first data attribute on one or more data attributes using the second neural network model, the second word vector mapping the first data attribute to the vector representations of words; and calculate the relationship between the first word vector and the second word vector using a similarity metric.
[040] In some of these implementations, the first neural network model comprises the Word2vec® model. In some of these implementations, the second model of neural network comprises the model of recurrent neural network.
[041] In some of such implementations, the similarity metric is based at least (i) on the Wu-Palmer distance between the word meanings associated with the first word vector and the second word vector, (ii) on a factor weighting, and (iii) a cosine distance in pairs between the first word vector and the second word vector.
[042] In some of these implementations, the device calculates analytical functions by performing a sequence of operations that comprises: obtaining word definitions for the second set of one or more analytical phrases from a publicly available dictionary; determine whether word definitions contain one or more predefined adjectives using an analysis API
Petition 870190054547, of 06/13/2019, p. 31/227
19/108 morphosyntactic provided by a library of tools for natural language processing; and, according to the determination that the word definitions contain one or more predefined adjectives, map the one or more predefined adjectives to one or more analytical functions.
[043] Typically, an electronic device includes one or more processors, memory, a display medium and one or more programs stored in memory. The programs are configured to run by one or more processors and are configured to perform any of the methods described here. The one or more programs include instructions for displaying a data view based on a first set of data retrieved from a database using a first set of one or more queries. The one or more programs also include instructions for receiving first input from the user to specify a first natural language command related to data visualization. The one or more programs also include instructions for extracting a first set of one or more analytical phrases from the first natural language command. The one or more programs also include instructions for calculating a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases. The one or more programs also include instructions for calculating a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases. The one or more programs also include instructions for updating the data view based on the first set of one or more functional phrases. The one or more programs also include instructions for receiving a second input from the user to specify a second natural language command related to the updated data visualization. The one or more programs also include
Petition 870190054547, of 06/13/2019, p. 32/227
10/208 instructions to extract a second set of one or more independent analytical phrases from the second natural language command. The one or more programs also include instructions for calculating a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases. The one or more programs also include instructions for calculating the cohesion between the first set of one or more analytical phrases and the second set of one or more analytical phrases. The one or more programs also include instructions for deriving a second set of one or more chat centers from the first set of one or more chat centers and the temporary set of one or more chat centers based on cohesion. The one or more programs also include instructions for calculating a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases, and updating the data view. based on the second set of one or more functional phrases.
[044] In some implementations, a non-temporary computer-readable storage medium stores one or more programs configured to run by a computing device containing one or more processors, memory and a display medium. The one or more programs are configured to perform any of the methods described here. The one or more programs include instructions for displaying a data view based on a first set of data retrieved from a database using a first set of one or more queries. The one or more programs also include instructions for receiving first input from the user to specify a first natural language command related to data visualization. The one or more programs also include instructions for extracting a first set
Petition 870190054547, of 06/13/2019, p. 33/227
21/108 of one or more analytical phrases from the first natural language command. The one or more programs also include instructions for calculating a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases. The one or more programs also include instructions for calculating a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases. The one or more programs also include instructions for updating the data view based on the first set of one or more functional phrases. The one or more programs also include instructions for receiving a second input from the user to specify a second natural language command related to the updated data visualization. The one or more programs also include instructions for extracting a second set of one or more independent analytical phrases from the second natural language command. The one or more programs also include instructions for calculating a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases. The one or more programs also include instructions for calculating the cohesion between the first set of one or more analytical phrases and the second set of one or more analytical phrases. The one or more programs also include instructions for deriving a second set of one or more chat centers from the first set of one or more chat centers and the temporary set of one or more chat centers based on cohesion. The one or more programs also include instructions for calculating a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases, and updating the data view. based on
Petition 870190054547, of 06/13/2019, p. 34/227
22/108 second set of one or more functional phrases.
[045] In another aspect, according to some implementations, a method is executed on an electronic device with a display medium. For example, the electronic device can be a smartphone, a tablet, a notebook, computer or a desktop computer (desktop). The device displays a data visualization based on a data set retrieved from a database using a first set of one or more database queries. A user specifies a first natural language command related to the displayed data view. Based on the displayed data visualization, the device extracts a first set of one or more independent analytical phrases from the first natural language command. The device then calculates a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases. The device then calculates a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases. The device then updates the data view based on the first set of one or more functional phrases. The user specifies a second natural language command related to the updated data view. After receiving the second natural language command, the device extracts a second from one or more independent analytical phrases from the second natural language command, and calculates a temporary set of one or more conversation centers associated with the second natural language command with based on the second set of one or more analytical phrases. The device then derives a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more
Petition 870190054547, of 06/13/2019, p. 35/227
23/108 transition rules. The device then updates the data view based on the second set of one or more conversation centers.
[046] In some implementations, the device additionally determines one or more data attributes corresponding to the second set of one or more conversation centers; scans displayed data views to identify one or more of the displayed data views that contain data tags whose characteristics correspond to a first data attribute in one or more data attributes; and highlights the data tags whose characteristics correspond to the first data attribute. In any of these implementations, the device additionally filters, from the displayed data visualizations, results that contain data marks whose characteristics do not correspond to one or more data attributes. In addition, in some of such implementations, the device receives input from the user to determine whether to filter or enhance the data tags (for example, through a natural language command, such as "delete", "remove" and " just filter ”).
[047] In some implementations, the visualization features include one or more of color, size and shape. In some implementations, the visualization characteristics correspond to a visual coding of data marks. In some implementations, the visual coding is one or more among color, size and format.
[048] In some implementations, the device determines whether none of the displayed data views contains data marks whose characteristics correspond to the first data attribute, and in this case, generates a specification for a new data visualization with the first data attribute (e.g., aggregation types) and displays the new data view. In some of such implementations, displaying the new data view additionally comprises determining a chart type based on the specification; and generate and
Petition 870190054547, of 06/13/2019, p. 36/227
10/248 display the graph. In addition, in some of these implementations, the graph is positioned using a layout algorithm based on a two-dimensional grid, automatically coordinating other data visualizations (sometimes called “views” here).
[049] In some implementations, the device additionally performs a sequence of operations comprising: calculating a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more phrases functional; select a first functional phrase from the second set of one or more functional phrases, in which the first functional phrase comprises a parameterized data selection criterion; select an initial range for parameter values of the parameterized data selection criteria; display an editable user interface control (for example, widgets) corresponding to the parameterized data selection criteria, where the user interface control displays the current values of the parameters; and sorting a displayed set of one or more editable user interface controls based on the order of the queries in the second natural language command, in which the order of the queries is inferred while extracting the second set of one or more analytical phrases from of the second natural language command. In some of these implementations, the user interface control allows the adjustment of the first functional phrase. In addition, in some of such implementations, the user interface control displays a slider, which allows a user to adjust the first functional phrase. In some of such implementations, ordering the displayed set of one or more editable user interface controls additionally comprises using a library that facilitates compact placement of the small word scale view within the text. In some of these implementations, the library is Sparklificator®.
Petition 870190054547, of 06/13/2019, p. 37/227
25/108 [050] In some implementations, the device performs a sequence of operations aimed at automatic correction of some user errors. The sequence of operations comprises: determining a first symbol (token) in the second natural language command that does not match any of the analytical phrases in the second set of one or more analytical phrases (for example, due to an analysis failure); search for a correctly spelled term corresponding to the first symbol using a search library by comparing the first symbol with one or more aspects of the first data set; and substituting the correctly misspelled term for the first symbol in the second natural language command to obtain a third natural language command; and extract the second set of one or more analytical phrases from the third natural language command. In some of such implementations, the one or more aspects include data attributes, cell values and related keywords from the first data set. In some of such implementations, the search library is a fuzzy string library, such as Fuse.js®.
[051] In some of such implementations, the device performs a sequence of operations comprising: determining if there is no correctly spelled term corresponding to the first symbol; and, according to a determination that there is no correctly spelled term corresponding to the first symbol: analyze the second natural language command to obtain an analysis tree; prune the analysis tree to remove the portion of the tree corresponding to the first symbol; and extract the second set of one or more analytical phrases based on the pruned analysis tree.
[052] In some implementations, the device additionally generates textual feedback indicating that the first symbol was not recognized, and therefore removed from the second natural language command - a situation that
Petition 870190054547, of 06/13/2019, p. 38/227
26/108 typically occurs when the statement has been understood only partially. In some of such implementations, the device displays the first symbol.
[053] In some implementations, the device additionally generates textual feedback indicating that the correctly misspelled term is used as a substitute for the first symbol in the second natural language command. This is usually the situation when the statement was not successfully understood, but the device suggested an alternative consultation. In addition, in some of such implementations, the device displays and highlights the spelled term correctly.
[054] Typically, an electronic device includes one or more processors, memory, a display medium and one or more programs stored in memory. The programs are configured to run by one or more processors and are configured to perform any of the methods described here. The one or more programs include instructions for displaying a data view based on a first set of data retrieved from a database using a first set of one or more queries. The one or more programs also include instructions for receiving first input from the user to specify a first command in natural language related to data visualization. The one or more programs also include instructions for extracting a first set of one or more analytical phrases from the first command in natural language. The one or more programs also include instructions for calculating a first set of one or more conversation centers associated with the first command in natural language based on the first set of one or more analytical phrases. The one or more programs also include instructions for calculating a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases. The one or more programs also include instructions for updating the data view with
Petition 870190054547, of 06/13/2019, p. 39/227
27/108 base on the first set of one or more functional phrases. The one or more programs also include instructions for receiving a second input from the user to specify a second command in natural language related to the updated data visualization. The one or more programs also include instructions for extracting a second set of one or more independent analytical phrases from the second command in natural language. The one or more programs also include instructions for calculating a temporary set of one or more conversation centers associated with the second command in natural language based on the second set of one or more analytical phrases. The one or more programs also include instructions for deriving a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules . The one or more programs also include instructions for updating the data view based on the second set of one or more conversation centers, where the update comprises: determining one or more data attributes corresponding to the second set of one or more conversation centers. conversation; scan displayed data views to identify one or more of the displayed data views that contain data tags whose characteristics correspond to a first data attribute in one or more data attributes; and highlight the data tags whose characteristics correspond to the first data attribute.
[055] In some implementations, a non-temporary computer-readable storage medium stores one or more programs configured to run by a computing device containing one or more processors, memory and a display medium. The one or more programs are configured to perform any of the methods described here. The one or more programs include instructions for displaying a data visualization based on
Petition 870190054547, of 06/13/2019, p. 40/227
28/108 a first set of data retrieved from a database using a first set of one or more queries. The one or more programs also include instructions for receiving first input from the user to specify a first command in natural language related to data visualization. The one or more programs also include instructions for extracting a first set of one or more analytical phrases from the first command in natural language. The one or more programs also include instructions for calculating a first set of one or more conversation centers associated with the first command in natural language based on the first set of one or more analytical phrases. The one or more programs also include instructions for calculating a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases. The one or more programs also include instructions for updating the data view based on the first set of one or more functional phrases. The one or more programs also include instructions for receiving a second input from the user to specify a second command in natural language related to the updated data visualization. The one or more programs also include instructions for extracting a second set of one or more independent analytical phrases from the second command in natural language. The one or more programs also include instructions for calculating a temporary set of one or more conversation centers associated with the second command in natural language based on the second set of one or more analytical phrases. The one or more programs also include instructions for deriving a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules . The one or more programs also include instructions for updating the data view
Petition 870190054547, of 06/13/2019, p. 41/227
29/108 based on the second set of one or more conversation centers, where the update comprises: determining one or more data attributes corresponding to the second set of one or more conversation centers; scan displayed data views to identify one or more of the displayed data views that contain data tags whose characteristics correspond to a first data attribute in one or more data attributes; and highlight the data tags whose characteristics correspond to the first data attribute.
[056] In this way, methods, systems and graphical user interfaces are revealed that allow users to efficiently explore the data displayed within a data visualization application.
[057] Both the previous general description and the following detailed description are exemplary and explanatory, and serve to better understand the invention as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS [058] For a better understanding of the systems, methods and graphical user interfaces mentioned above, as well as of the additional systems, methods and graphical user interfaces that offer analytical data visualization tools, reference should be made to the Description of the Implementations below, together with the following drawings, in which similar reference numerals refer to corresponding parts throughout the figures.
[059] Figure 1 illustrates a graphical user interface used in some implementations.
[060] Figure 2 is a block diagram of a computing device according to some implementations.
[061] Figure 3A is a process flow chart illustrating a process for using natural language applying pragmatic principles for visual analysis of a data set according to some implementations; and Figure 3B is a
Petition 870190054547, of 06/13/2019, p. 42/227
30/108 state machine diagram illustrating conversation center states and the transition between states when specific transition rules are triggered, according to some implementations.
[062] Figures 4A to 4B are diagrams illustrating the use of different transition rules in managing analytical conversations according to some implementations; and Figures 4C to 4G illustrate graphical user interfaces (which relate to Figures 4A to 4B) for interactive data analysis using natural language processing in a data visualization application according to some implementations.
[063] Figure 5 is a diagram illustrating a general framework for applying the principles of pragmatics to visual analytics according to some implementations.
[064] Figures 6A is a diagram illustrating the application of pragmatic principles to incomplete utterances (sometimes called “Elipse” here) according to some implementations; and Figures 6B to 6D illustrate graphical user interfaces (which relate to Figure 6A) for interactive data analysis using natural language processing in a data visualization application according to some implementations.
[065] Figure 7A is a diagram illustrating the application of pragmatic principles for utterances with reference expressions (sometimes referred to here as anaphoric references) according to some implementations; and Figures 7B to 7D illustrate graphical user interfaces (which relate to Figure 7A) for interactive data analysis using natural language processing in a data visualization application according to some implementations.
[066] Figure 8A is a diagram illustrating the application of pragmatic principles to utterances with conjunctions according to some
Petition 870190054547, of 06/13/2019, p. 43/227
10/31 implementations; Figure 8B illustrates a graphical user interface (which relates to Figure 8A) for interactive data analysis using natural language processing in a data visualization application according to some implementations; and Figure 8C illustrates how a system iteratively connects the analytical functions of adjacent nodes in a linearization analysis tree, according to some implementations.
[067] Figure 9A is a diagram illustrating the application of pragmatic principles to manage with lexical cohesion according to some implementations; and Figures 9B to 9C illustrate graphical user interfaces (which relate to Figure 9A) for interactive data analysis using natural language processing in a data visualization application according to some implementations.
[068] Figure 10A is a diagram illustrating the application of pragmatic principles for repair statements according to some implementations; and Figures 10B to 10C illustrate graphical user interfaces (which relate to Figure 10A) for interactive data analysis using natural language processing in a data visualization application according to some implementations.
[069] Figure 11A is a diagram illustrating the application of pragmatic principles to manage responses and feedback according to some implementations; Figure 11B is a data visualization that further illustrates the methodology illustrated in Figure 11 A; and Figure 11C shows how an instrument panel is built progressively based on the input statements, according to some implementations.
[070] Figure 12A illustrates a set of widgets generated to manage ambiguity in a user's query according to some implementations; and Figure 12B illustrates examples of feedback for various situations according to
Petition 870190054547, of 06/13/2019, p. 44/227
32/108 with some implementations.
[071] Figures 13A-13J show a flowchart of a process that uses natural language for visual analysis of a data set applying principles of pragmatics, according to some implementations.
[072] Figures 14A to 14R present a flowchart of a process that uses natural language for visual analysis of a data set applying principles of pragmatics, including to manage various forms of pragmatics, according to some implementations.
[073] Figures 15A to 15H present a flowchart of a process that uses natural language for visual analysis of a data set applying principles of pragmatics, including to manage ambiguity in a user's consultation, according to some implementations.
[074] Reference will now be made to the implementations, examples of which are illustrated in the accompanying drawings. In the description that follows, numerous specific details are presented in order to enable a thorough compression of the present invention. However, it will be clear to those of ordinary skill in the art that the present invention can be practiced without requiring these specific details.
DESCRIPTION OF IMPLEMENTATIONS [075] Figure 1 illustrates a graphical user interface 100 for interactive data analysis. User interface 100 includes a Data 114 tab, and an Analytics 116 tab according to some implementations. When the Data 114 tab is selected, user interface 100 displays a schema information region 110, which is also called a data panel. The schema information region 110 provides named data elements (e.g., field names) that can be selected and used to construct a data view. In some implementations, the list of field names is separated into
Petition 870190054547, of 06/13/2019, p. 45/227
33/108 a group of dimensions (for example, categorical data) and a group of measures (for example, numerical quantities). Some implementations also include a list of parameters. When the Analytics 116 tab is selected, the user interface displays a list of analytical functions instead of the data elements (not shown).
[076] The graphical user interface 100 also includes a data display region 112. The data display region 112 includes a plurality of shelf region regions, such as a column shelf region 120 and a shelf region of rows 122. These are also called the column shelf 120 and the row shelf 122. As illustrated here, the data visualization region 112 also has a large space for displaying a visual graph (also called the data visualization here). Since no data element has been selected so far, the space initially has no visual graph. In some implementations, the data visualization region 112 has multiple layers that are called sheets.
[077] In some implementations, the graphical user interface 100 also includes a natural language processing region 124. Natural language processing region 124 includes an input bar (also called the command bar here) for receiving commands from natural language. A user can interact with the input bar to provide commands. For example, the user can type a command in the entry bar to provide the command. In addition, the user can interact indirectly with the input bar using a microphone to speak (for example, an audio input device 220) to provide commands. In some implementations, data elements are initially associated with column shelf 120 and row shelf 122 (for example, using drag and drop operations from schema information region 110 to column shelf 120 and / or to the shelf
Petition 870190054547, of 06/13/2019, p. 46/227
34/108 of rows 122). After the initial association, the user can use natural language commands (for example, in the natural language processing region 124) to further explore the displayed data visualization. In some cases, a user creates the initial association using the natural language processing region 124, which results in one or more data elements being placed on the column rack 120 and the row rack 122. For example, the user can provide a command to create a relationship between data element X and data element Y. In response to receiving the command, column shelf 120 and row shelf 122 can be populated with data elements (e.g. the column shelf 120 can be filled with data element X and the row shelf 122 can be filled with data element Y, or vice versa).
[078] Figure 2 is a block diagram illustrating a computing device 200, which can display the graphical user interface 100 according to some implementations. Various examples of computing device 200 include a desktop computer, a laptop computer, a tablet computer, and other computing devices that have a display medium and a processor capable of running a data visualization application 230. The computing device 200 typically includes one or more processing units (processors or cores) 202, one or more networks or other communications interfaces 204, memory 206 and one or more communications buses 208 to interconnect these components. Communication buses 208 optionally include sets of circuits (sometimes called chipsets) that interconnect and control communications between system components. The computing device 200 includes a user interface 210. User interface 210 typically includes a display device 212. In some implementations, computing device 200 includes
Petition 870190054547, of 06/13/2019, p. 47/227
35/108 input devices, such as a keyboard, mouse and / or other input buttons 216. As an alternative or additionally, in some implementations, the display device 212 includes a touch sensitive surface 214, in which case the device 212 is a touch-sensitive display medium. In some implementations, the touch sensitive surface 214 is configured to detect various sliding gestures (for example, continuous gestures in the vertical and / or horizontal directions) and / or other gestures (for example, single / double touch). On computing devices that have a 214 touch-sensitive display medium, a physical keyboard is optional (for example, a virtual keyboard) can be displayed when keyboard input is required). User interface 210 also includes an audio output device 218, such as speakers, or an audio output connection connected to the speakers, earphones or headphones. In addition, some computing devices 200 use a microphone and voice recognition to complement or replace the keyboard. Optionally, computing device 200 includes an audio input device 220 (for example, a microphone) for capturing audio (for example, speaking from a user).
[079] Memory 206 includes high speed random access memory, such as DRAM, SRAM, RAM DDR or other solid state random access memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices or other non-volatile solid state storage devices. In some implementations, memory 206 includes one or more storage devices located remotely from processor 202 (s). Memory 206, or, alternatively, non-volatile memory device (s) within memory 206 include a computer-readable non-temporary storage medium. In some implementations, memory 206, or the readable storage medium by
Petition 870190054547, of 06/13/2019, p. 48/227
36/108 memory computer 206, stores the following programs, modules and data structures, or a subset or superset of them:
• an 222 operating system, which includes procedures for handling various basic system services and for performing hardware-dependent tasks;
• a communications module 224 that is used to connect computing device 200 to other computers and devices via one or more communication network interfaces 204 (wired or wireless), such as the Internet, other wide area networks , local networks, metropolitan networks, and so on;
• a 226 Network (web) browser (or other application capable of displaying Network pages), which allows a user to communicate over a network with computers or remote devices;
• an audio input module 228 (for example, a microphone module) to process the audio captured by the audio input device 220. The captured audio can be sent to a remote server and / or processed by an application running on computing device 200 (for example, the data visualization application 230);
• a data visualization application 230 to generate data visualizations and related aspects. Application 230 includes a graphical user interface 232 (for example, the graphical user interface 100 illustrated in Figure 1) for a user to build visual graphics. For example, a user selects one or more data sources 240 (which can be stored on computing device 200 or stored remotely), selects data fields from the data source (s), and uses the selected fields to define a visual graph; and • zero or more databases or data sources 240 (for example, a first data source 240-1 and a second data source 240-2), which are used by the data visualization application 230. In some implementations ,
Petition 870190054547, of 06/13/2019, p. 49/227
37/108 the data sources are stored as spreadsheet files, CSV files, text files, JSON files, XML files or flat files, or stored in a relational database.
[080] In some implementations, the data visualization application 230 includes a data visualization generation module 234, which obtains user input (for example, a visual specification 236), and generates a corresponding visual graph. The data visualization application 230 then displays the visual graph generated in the 232 user interface. In some implementations, the data visualization application 230 runs as a stand-alone application (for example, a desktop application). In some implementations, the data visualization application 230 runs within the Network 226 browser or in another application using Network pages provided by a Network server (for example, a server-based application).
[081] In some implementations, the information the user provides (for example, user input) is stored as a 236 visual specification. In some implementations, the visual specification 236 includes previous natural language commands received from a user or properties specified by the user through natural language commands.
[082] In some implementations, the data visualization application 230 includes a language processing module 238 for processing (for example, interpreting) commands provided by a user of the computing device. In some implementations, the commands are natural language commands (for example, captured by the audio input device 220). In some implementations, the language processing module 238 includes sub-modules, such as an autocomplete module, a pragmatics module and an ambiguity module, each of which is discussed in more detail below.
Petition 870190054547, of 06/13/2019, p. 50/227
38/108 [083] In some implementations, memory 206 stores metrics and / or scores determined by the language processing module 238. In addition, memory 206 can store thresholds and other criteria, which are compared with metrics and / or scores determined by the language processing module 238. For example, the language processing module 238 can end a kinship metric (discussed in detail below) for an analytical word / phrase from a received command. Then, the language processing module 238 can compare the kinship metric with a threshold stored in memory 206.
[084] Each of the executable modules, applications or sets of procedures identified above can be stored in one or more of the memory devices mentioned above, and corresponds to a set of instructions for performing a function described above. The modules or programs identified above (ie, instruction sets) do not need to be implemented as separate program software, procedures or modules, and in this way, several subsets of these modules can be combined or otherwise reordered in the various implementations. In some implementations, memory 206 stores a subset of the modules and data structures identified above. In addition, memory 206 can store additional modules or data structures not described above.
[085] Although Figure 2 illustrates a computing device 200, Figure 2 is conceived more as a functional description of the various aspects that may be present rather than as a structural schematic of the implementations described here. In practice, and as recognized by those skilled in the art, the items presented separately could be combined and some items could be separated.
[086] Figures 3A is a process flow chart illustrating a process or
Petition 870190054547, of 06/13/2019, p. 51/227
39/108 a structure to use natural language applying pragmatic principles for visual analysis of a data set according to some implementations. The structure based on a conversational interaction model extends the centralization approach employed in pragmatic theory to support inter-sentential transition states to continue, retain and shift the context of the data attributes at stake. The structure supports a “visual analysis cycle”, an interface that supports fluid iterative exploration and refinement in visual analytics. Interaction with visual analysis is most effective when users can focus on answering the questions they have about their data, rather than on how to operate the interface for the analysis tool. Pragmatics is particularly important for the flow of visual analysis, where questions and insights often arise from previous questions and patterns of data that a person observes.
[087] Sequences of utterances that present coherence form a conversation. Coherence is a semantic property of conversation, based on the interpretation of each individual statement in relation to the interpretation of other statements. As mentioned earlier, in order to correctly interpret a set of statements, the process structure uses and extends a model generally used for discourse structure called conversational centralization, according to some implementations. In this model, the utterances are divided into constituent speech segments, incorporating relations that can be valid between two segments. A center refers to entities serving to link this utterance to other statements in the discourse. Consider a DS speech segment with Lh, ..., Um utterances. Each U _n (1 <n <m) utterance in DS is assigned to a set of prospective centers, Cf (U _n , DS) referring to the focus current conversation; each statement other than the initial statement of the segment is attributed to a set of centers
Petition 870190054547, of 06/13/2019, p. 52/227
Regressive 40/108, Cb (U _n , DS). The set of regressive centers of a new L / _{n +} i statement is Cb (U _n + i, DS), which is equal to the prospective centers of U _n (that is, Cf (U _n , DS)). In the context of visual analytic conversions, prospective and regressive centers include data and value attributes, visual properties and analytical actions (for example, filtering, highlighting).
[088] Each speech segment presents both global coherence, that is, the global context of the entire conversation, generally referring to a topic or theme of the conversation, and local coherence, that is, coherence between the statements within that conversation. Local coherence refers to the inference of a sequence of statements within a local context through the transition states of continuing, retaining and replacing between Cf (U _n , DS) and Cb (U _n , DS). The structure extends this theory of conversational centralization to visual analytical conversation by introducing a set of rules for each of these constructs of local coherence, according to some implementations.
[089] Given a U _n statement, a system implementing this structure responds by performing a series of analytical functions derived from the prospective centers Cf (U _n , DS). An analytical function F (X, op, v) consists of a variable X (which can be an attribute or a visualization property), an op operator, and a value v (usually a constant), according to some implementations. For example, when the user says “measles in the uk”, the system creates two functions, namely, F_CAT (diseases, ==, measles) and F_CAT (country, ==, uk). When the user provides a new L / n + i statement, the system first creates a set of temporary Ctem _P centers (U _n + i, DS) from L / _n + i without considering any previous context. The system then applies a set of rules to create a set of prospective centers, Q (L / _{n +} i, DS) based on some operations defined between Cb (Un + pDS) and Ctemp (U _n + i, DS). The prospective centers are then used to respond to the user's statement according to some implementations.
Petition 870190054547, of 06/13/2019, p. 53/227
41/108 [090] Figure 3A illustrates this process for two statements U _n (302) and t / n + 1 (312) according to some implementations. The system creates (304) a set of temporary centers Ctem _P (U _n ) (306) from U _n without considering any previous context. At the time of receiving the Un enunciation, the system either pre-calculated a set of regressive conversation centers Cb (U _n ) (301) based on a previous enunciation U _n -i, or, if U _n is the first enunciation, initialized Cb (U _n ) to a null set. The system then applies (308) one or more transition rules (described below with reference to Figure 3B) to derive a set of prospective centers Cf (U _n ) (310) from regressive centers Cb (U _n ) (301) and the temporary Ctemp conversation centers (U _n ) (306). The prospective conversation centers Cf (U _n ) (310) are also the regressive conversation centers Cb (U _n +1) for the next L / _{n +} i utterance (312). The process described so far is repeated for the next L / _{n +} i statement (312). The system calculates (314) a set of temporary Ctemp centers (U _n ) (316) from U _n without considering any previous context. The system then applies (318) to one or more transition rules to derive a set of prospective centers Cf (U _n + i) (319) from regressive centers Cb (U _n + i) (310) and centers of temporary conversation Ctemp (Un + i) (316). The system uses the Cf (U _n + i) (319) prospective conversation centers to update one or more data views according to some implementations. The system also uses the set of prospective centers Cf (Un + i) (319) as the regressive conversation centers Cb (Un + 2) for the next U _n +2 utterance, and so on. When the user moves to a different data set or resumes viewing, the system updates the overall coherence of the analytical conversation, and removes all previous states (including prospective conversation centers, regressive conversation centers and conversation centers. temporary staff), according to some implementations.
[091] Figure 3B is a state machine diagram that illustrates states
Petition 870190054547, of 06/13/2019, p. 54/227
42/108 of the conversation center and the transition between states, according to some implementations. State 322 encapsulates regressive conversation centers Cb and temporary conversation centers Ctemp, and each of states 324, 326 and 328 represents different states of prospective conversation centers Cf. Conversation centers correspond to an L / _{n +} i statement (no illustrated). What follows is a description of each of the transition rules, that is, when each transition occurs and how the final states are calculated.
[092] The Continue 323 transition continues the context from regressive centers Cb to prospective centers Cf, according to some implementations. In other words, each of the conversation centers in the regressive conversation centers Cb is included in the prospective conversation centers Cf (324). Using a defined notation, for a given Un + i enunciation, in a DS speech segment, as a result of this transition,
Cb (Un + i, DS) θ Cf (Un + i, DS), along with other entities [093] This transition occurs when a variable X is in Ctemp (U _n + i) but not in Cb (U _n + i, DS), according to some implementations. In this case, the system performs the following joining operation:
Cf (Un ₊ l, DS) = Cb (Un ₊ p DS) U Ctemp (Un ₊ 1, DS) [094] The Reter 325 transition retains the context from the regressive centers Cb (322) in the prospective centers Cf (326 ) without adding additional entities to the prospective centers, according to some implementation; that is,
Cf (U _{n +} i, DS) = Cb (U _{n +} p DS) [095] The Reter 325 transition is triggered when the variable X is in Cb (Un + i, DS), but not in Ctem _P (U _n + i , DS), according to some implementations.
[096] In some implementations, with the Shift 327 transition, the context shifts from regressive conversation centers 322 to prospective conversation centers 328; that is,
Petition 870190054547, of 06/13/2019, p. 55/227
43/108
C _f (U _n + i, DS) * Cb (JJ _{n +} l, DS) [097] In some implementations, the Shift 327 transition occurs when the X variable is in both Cb (U _n + i, DS) and Ctemp (U _n + i, DS), but the corresponding values are different. In this case, the system replaces all regressive centers Cb (U _n + i, DS) containing Xpor Ctemp (U _n + i, DS). As Figure 3B illustrates, this substitution can be represented using the equation:
Cf (Un + 1, DS) = Cb (Un + 1, DS) - Xcb + Xctemp [098] In some implementations, the Shift 327 transition also occurs when a filter constraint is removed; for example, removing a widget for “measles” shifts the measles “disease” variable to all diseases.
[099] Figures 4A is a diagram illustrating the use of different transition rules in handling analytical conversations according to some implementations. Suppose that a system implements the structure described above with reference to Figures 3A and 3B. When the system receives a 400 enunciation (for example, “measles in the uk”), the system calculates (402) prospective conversation centers (404) corresponding to the enunciation, and initializes regressive centers, according to some implementations. Conversational centers for illustrative enunciation include "measles" and "uk". Based on the calculated conversation centers, the system applies (406) filters to update the view, for example, it applies categorical and spatial filters showing “measles in the UK” in view 408. Figure 4D shows a enlarged version of view 408. As illustrated in Figure 4D, the statement “measles in the UK” results in the system showing a view of “Disease Outbreaks Around the World” with a focus on (DISEASES = = Measles (442) and COUNTRY == United Kingdom (444)). Referring again to Figure 4A, assuming the user responds to view 408 with
Petition 870190054547, of 06/13/2019, p. 56/227
44/108 a new enunciation 410 (for example, “show me the orange spike”), the system calculates (412) the prospective conversation centers by applying the Continue rule (described above with reference to 323 , Figure 3B). This is due to the fact that the temporary conversation center (not shown) for the enunciation example includes “orange spike”, which corresponds to the visualization characteristics (a type of variable), namely, variable form ( with the value "spike" (peak)) and variable color (with the value "orange" (orange)), and these variables are absent in the regressive conversation centers (404). Furthermore, since the enunciation 410 does not refer to the other variables (DISEASES or COUNTRY) in the regressive conversation centers (404), the system does not apply the Hold rule or the Shift rule. The prospective conversation centers (414) for enunciation 410 are thus calculated as {measles, uk, orange spike} (measles, united kingdom, orange peak). Based on these prospective centers, the system updates the corresponding graphs in the data view, for example, by highlighting the orange peak in the measles line on the graph in view 418. Figure 4E shows an enlarged version of view 418. As the example shows, the system highlights and notes (446) the peak with the words “4.4 Measles cases in 2014” in response to the statement “show me the orange peak”, according to some implementations.
[0100] Figure 4B is another diagram additionally illustrating the application of different transition rules for example statements. This example follows the user's response to view 328 in Figure 4A. Referring to Figure 4B, when the system receives a 420 enunciation (for example, “mumps over there” (mumps in this location)), the system calculates (422) prospective conversation centers corresponding to the enunciation, according to some implementations. The regressive conversation centers (424) (e.g. {measles, united kingdom, orange peak}) correspond to the conversation centers
Petition 870190054547, of 06/13/2019, p. 57/227
45/108 prospects (414) calculated in the previous step (in response to enunciation 410) illustrated in Figure 4A. The temporary conversation centers (not shown) for the 420 statement (for example, "mumps over there") do not include a conversation center (for example, "orange spike" ("orange peak") ) derived for the previous statement. This causes the system to be activated to apply the Hold rule (described above with reference to 325, Figure 3B). For example, temporary conversation centers include the variable DISEASES, but the value of this variable is changed from measles to mumps. This causes the system to be triggered to apply the Shift rule (described above with reference to 327, Figure 3B). As a result, the system calculates the prospective conversation centers (424) (for, for example, {mumps, uk} (mumps, uk)). Based on these prospective conversation centers, the system responds (426) by applying the necessary filters (for example, spatial retention filter for “UK” (United Kingdom), categorical update filter for “mumps” (mumps)), and updating the view (for example, removing the highlighted peak in the mumps line), as illustrated in view 428. Figure 4F shows an enlarged version of view 428. As illustrated in Figure 4F, the statement “mumps over there” (mumps in this results in the system showing a visualization of “Disease Outbreaks Around the World” with a focus on (DISEASES == Mumps (448) and COUNTRY == United Kingdom (450)).
[0101] Referring again to Figure 4B, to continue the example, assuming the user consults with another statement (430) (for example, “measles epidemic in Malawi congo angola” (measles epidemic in Malawi congo angola ” ), the system calculates (432) the prospective centers corresponding to this statement using the Shift rule described above. For the illustrated example, the reason for applying the Shift rule is that the values of the DISEASES variable are now different, namely “measles ”(Measles) and“ epidemic ”(epidemic);
Petition 870190054547, of 06/13/2019, p. 58/227
46/108 similarly, the geographical region was changed from “UK” (United Kingdom) to “Malawi, Congo ad Angola” (Malawi, Congo and Angola). Based on the application of the Shift rule, the system calculates the prospective conversation centers (434) (for this example, the centers are {measles, epidemic, malawi, congo, angola} (measles, epidemic, malawi, congo, angola)) . Based on the prospective conversation centers (434), the system responds (436) to the statement further by applying appropriate filters (for example, a categorical filter for the measles epidemic, and a new spatial filter in Malawi, Congo and Angola, replacing United Kingdom), thereby generating visualization 438 (for example, showing measles epidemic in Malawi, Congo and Angola). Figure 4G shows an enlarged version of view 438. As illustrated in Figure 4G, the statement “measles epidemic in malawi congo angola” (“measles epidemic in malawi congo angola) results in the system showing a preview of“ Disease Outbreaks Around the World ”(Disease Epidemics Worldwide) with a focus on (DISEASES == Measles (452), IMPACT = Epidemic (454), COUNTRY == Malawi (456), COUNTRY == Congo (458) and COUNTRY == Angola 460)).
[0102] Figure 4C illustrates the updates to the data visualizations in response to the user's statements described above with reference to Figures 4A and 4B, according to some implementations. Views 408, 418, 428 and 438 have been described above with reference to Figures 4D, 4E, 4F and 4G, respectively.
[0103] Figure 5 is a diagram illustrating a general framework for applying the principles of pragmatics to visual analytics according to some implementations. The framework uses and extends a model generally used for discourse structure called conversational centralization, according to some implementations. In this model, the statements are divided into constituent speech segments, incorporating relationships that can be
Petition 870190054547, of 06/13/2019, p. 59/227
47/108 valid between two segments. A center refers to entities serving to link this utterance to other statements in the discourse. Consider a DS speech segment with L / i utterances, Um. Each U _n (1 <n <m) utterance in DS is assigned to a set of prospective centers, Cf (U _n , DS) referring to the current focus of the conversation; each statement other than the initial statement of the segment is attributed to a set of regressive centers, Cb (U _n , DS). The set of regressive centers of a new L / _n + i statement is Cb (U _n + i, DS), which is equal to the prospective centers of U _n (that is, Cf (U _n , DS)). In the context of visual analytic conversions, prospective and regressive centers include data and value attributes, visual properties and analytical actions (for example, filtering, highlighting).
[0104] Given a U _n statement, a system implementing this structure responds by executing a series of analytical functions derived from the prospective centers Cf (U _n , DS). An analytical function F (X, op, v) consists of a variable X (which can be an attribute or a visualization property), an op operator, and a value v (usually a constant). For example, when the user says “measles in the uk”, the system creates two functions, namely, F_CAT (diseases, ==, measles) and F_CAT (country, ==, uk). When the user provides a new L / n + i statement, the system first creates a set of temporary Ctemp centers (Un + i, DS) from L / _{n +} i without considering any previous context. The system then applies a set of rules to create a set of prospective centers, Cf (Un + i, DS) based on some operations defined between Cb (U _n + i, DS) and Ctemp (Un + i, DS). The prospective centers are then used to respond to the user's statement according to some implementations.
[0105] Figure 5 illustrates this process for two statements U _n (500) and L / n + i (520) according to some implementations. The system calculates (526) a set of temporary centers Ctem _P (U _n + i) (528) from L / _n + i without considering any previous context. When receiving the L / n + i statement, the system or
Petition 870190054547, of 06/13/2019, p. 60/227
48/108 pre-calculated a set of regressive conversation centers Cb (U _n + i) (504) based on the previous statement U _n , or, if U _n was the first statement, initialized Cb (U _n + i) (504) for a null set.
[0106] Centralized conversation postulates that utterances display a connection between them. The way in which these statements are linked together to form a conversation is called cohesion. Cohesion occurs as a result of the combination of both grammatical and lexical structures in the constituent sentences. Therefore, the identification of the sentence structure is a logical starting point for solving this statement in one or more analytical functions applied to the visualization. The sentence structure includes both lexical and grammatical structure. In Figure 5, a system implementing this structure calculates the sentence structure for the L / _n + i (520) statement in step 522. Typically, an analyzer is used to calculate the sentence structure. An analyzer receives an input sentence (sometimes called a natural language query or command) and decomposes the input sentence into a sequence of symbols (linguistic elements) by applying a set of grammatical rules specific to a particular natural language. , such as English. In some implementations, grammatical rules can be modified to suit the environment. In some implementations, a probabilistic grammar is applied to provide a structural description of the input queries. Probabilistic grammars are useful for resolving ambiguities in sentence analysis. The probability distributions (for grammatical production rules) can be estimated from a corpus of sentences analyzed by hand, for example. Some implementations derive the additional syntactic structure by employing a Morphosyntactic Labeler (POS Tagger) that assigns components of the sentence, such as noun, verb, adjective, to each word (sometimes called a symbol). Some implementations resolve the output analyzed in data attributes
Petition 870190054547, of 06/13/2019, p. 61/227
49/108 wages and corresponding categories. As the dashed lines connecting blocks 500 and 510 show, in some implementations, the system also calculates (510) the sentence structure for the L / _n (500) enunciation.
[0107] With the sentence structure (s), the system proceeds to determine (530) the type of pragmatic forms (examples of which are described below with reference to Figures 6A to 10G) and any other information related to sentence structure (for example, linguistic elements present in one statement, but absent in another). Based on the pragmatic form and phrase structure information (532), the system then derives (534) prospective conversation centers Cf (536) from the LUi utterance (520) using the temporary conversation centers (528) and the speech centers regressive conversation Cb (504), according to some implementations. As shown in Figure 5, even before receiving the statement L / _{n +} i (520), the system derives and / or displays one or more initial views (508) based on the regressive conversation centers Cb (504) through the application (506) from a first set of operations (for example, applying filters) to existing views. In some implementations, based on Cf prospective conversation centers (536), the system applies (538) a second set of operations (for example, applying filters) to update existing views or generate new views (540).
[0108] Figures 6A is a diagram illustrating the application of pragmatic principles to incomplete utterances (sometimes called an ellipse) according to some implementations. Ellipses are fragments of syntactically incomplete sentences that exclude one or more linguistic elements. These statements can be better understood with the context previously established. Figure 6A illustrates how an incomplete statement “townhomes” (“houses”) is understood in the context of the previous statement “houses in less than 1M in Ballard” (“houses less than 1M in Ballard”). When the system receives a
Petition 870190054547, of 06/13/2019, p. 62/227
50/108 enunciation U _n (600), which, in this example, is the enunciation “houses in less than 1M in Ballard” (“houses less than 1M in Ballard”), the system calculates prospective centers and initializes regressive centers (for the statement L / _n + i) in step 602, using the methodology described above with reference to Figure 5 (step 502), according to some implementations. For the example statement, the system calculates the set of conversation centers as {houses, ballard, and 1M} (houses, ballard and 1M). In some implementations, the system applies (606) filters to the data set based on the set of conversation centers (604), and displays a data view (608). In this example, the system applies numerical and spatial filters showing houses less than $ 1M in Ballard. Figure 6D shows an enlarged version of view 608. As illustrated in Figure 6C, the statement “houses less than 1M in Ballard” results in the system showing a view of “Past Home Sales - Seattle” (Previous Home Sales - Seattle ”) with a focus on (LAST_SALE_PRICE less than 1.0M (642) in Ballard (644)). Referring again to Figure 6A, in some implementations, the system also calculates the sentence structure (610) for the enunciation U _n in step 610 using one or more techniques described above with reference to step 510 in Figure 5.
[0109] In some implementations, when the system receives an LUi enunciation (620), which in this example is the “townhomes” enunciation, the system calculates (626) temporary conversation centers for LUi (620). For this example, the system calculates the conversation centers (628) as the set {townhomes} (houses). In addition, the system calculates (622) the sentence structure for the L / _{n +} i (620) enunciation using the techniques described above with reference to step 522 (Figure 5), according to some implementations.
[0110] As mentioned above, ellipses exclude one or more linguistic elements. With the aid of phrase structures (612 and 624), the system determines
Petition 870190054547, of 06/13/2019, p. 63/227
51/108 a subset of conversation centers of the U _n (600) enunciation that correspond to linguistic elements absent in the L / _n + i (620) enunciation, according to some implementations. In this example, the system calculates the subset as the set {ballard, 1M}, since the linguistic elements, that is, a noun phrase that refers to a place after a prepositional phrase (corresponding to “ballard”) and a noun phrase that refers to a price value after another prepositional phrase (corresponding to “1M” or, more precisely, “less than 1M”), are absent in the statement L / _n + i (620), but were present in the statement U _n (600). On the other hand, the phrase "houses" (houses) in the enunciation U _n (600) and the phrase "townhomes" (houses in the city) in the enunciation L / _n + i (620) correspond to similar linguistic elements (for example, both phrases are noun phrases and refer to house types).
[0111] In step 634, the system combines the temporary set of conversation centers, which, in this example, is the set {townhomes} (houses in the city), with the subset of conversation centers (632) to arrive at a set of prospective conversation centers (638) for the LUi enunciation, according to some implementations. Based on the calculated set of prospective conversation centers (636), the system determines the type of filters to be applied to the data set and applies the appropriate filters in step 638 to display an appropriate data view (640), according to some implementations. In this example, since the “ballard” and “1M” conversation centers were retained from the regressive conversation centers (604), the system retains the numeric filter (corresponding to 1M) and the spatial filter (corresponding to Ballard) . In addition, since the conversation center value (corresponding to the home_type variable) has been changed from “townhomes” to “houses”, the system applies a categorical filter on home_type to show the houses
Petition 870190054547, of 06/13/2019, p. 64/227
52/108 in the city (instead of just houses). Figure 6D shows an enlarged version of the 640 view. As illustrated in Figure 6D, the statement “townhomes” results in the system showing a view of “Past Home Sales Seattle” (Retaining Home Sales - Seattle) retaining the last widget or filter LAST_SALE_PRICE (642) and the spatial filter 644 from the previous view 608, and replacing HOME_TYPE with “townhouses” (646).
[0112] Figure 6B illustrates the updates to the data visualizations in response to the user's statements described above with reference to Figure 6A, according to some implementations. Views 608 and 640 were described above with reference to Figures 6C and 6D, respectively.
[0113] Figure 7A is a diagram illustrating the application of pragmatic principles for utterances with reference expressions (sometimes referred to here as anaphoric references) according to some implementations. Reference expressions help to unify the text and create savings, avoiding unnecessary repetition. Referencing is a form of conversation that, instead of being interpreted semantically by itself, makes reference to something else for its interpretation. When the interpretation is within the text, this is known as anaphoric referencing. In the interaction with visual analytics, the reference belongs to data tributes and analytical functions. Figure 7A illustrates how a “previous year” statement is understood in the context of the previous “prices in 2015” statement. When the system receives a U _n (700) statement, which in this example is the “prices in 2015” statement (“prices in 2015”), the system calculates prospective centers and initializes regressive centers (for the L / _n + statement) i) in step 702, using the methodology described above with reference to Figure 5 (step 502), according to some implementations. For the example statement, the system calculates the set of conversation centers as being {prices, 2015} (prices, 2015). In some implementations, the system applies (706)
Petition 870190054547, of 06/13/2019, p. 65/227
53/108 filters the data set based on the set of conversation centers (704), and displays a data view (708). In this example, the system applies time filters showing house prices in the year 2015. Figure 7C shows an enlarged version of the 708 view. As illustrated in Figure 7C, the statement “prices in 2015” (prices in 2015) results in the system showing a view of “Past Home Sales - Seattle” in 2015. Although not illustrated, a previous view, for example, in response to a previous statement, caused the system to check prices in Seattle. Referring again to Figure 7A, when the system receives an enunciation t / n + 1 (720), which, in this example, is the enunciation “previous year”, the system calculates (722) the sentence structure for the L / _{n +} i (720) enunciation using techniques described above with reference to step 522 (Figure 5), according to some implementations.
[0114] As mentioned above, reference expressions with anaphoric references refer to something else within the text. Based on the sentence structure (724), the system identifies (726) anaphors in the L / _{n +} i (720) enunciation, according to some implementations. In this example, the system identifies the anaphor (728) “previous”. Using the identified anaphor, the system then identifies (734) the phrasal block (732) containing the reference to identify the entities to which the reference is referring, according to some implementations. For the illustrated example, the system identifies phrase block “year” (year) that corresponds to the anaphor “previous”. Based on the identified anaphor and phrasal block, in step 730, the system searches through regressive centers to find such entities and replaces the anaphoric reference with these entities, according to some implementations. Additionally, in some implementations, as is the case in this example, the system also detects and applies appropriate functions to the entity's value. For the example shown, the system
Petition 870190054547, of 06/13/2019, p. 66/227
54/108 also detects that the user is referring to the “previous” year, and, therefore, the value of 2015 is decreased by 1 before reaching the right value for the variable “year” (year). The system calculates the date for “previous” using a temporal function (for example, DATECALC), according to some implementations. The system arrives at a set of prospective conversation centers (736), which, for this example, is the set {prices, 2014} (prices, 2014). Based on this set, the system performs necessary steps to update the view in step 738, according to some implementations. For this example, the system retains a year reference and updates the time filter for 2014, to show the view in 740. Figure 7D shows an enlarged version of view 740. As illustrated in Figure 7D, the statement “previous year” ( previous year) results in the system showing a preview of “Past Home Sales - Seattle” (Sales of Previous Homes - Seattle ”) in 2014, the year prior to 2014 (from the previous view).
[0115] Figures 7B to 7D illustrate graphical user interfaces (related to Figure 7A) for interactive data analysis using natural language processing in a data visualization application according to some implementations. Figure 7B illustrates the updates to the data visualizations in response to the user's statements described above with reference to Figure 7A, according to some implementations. Views 708 and 740 were described above with reference to Figures 7C and 7D, respectively.
[0116] Although not shown in Figure 7A, in some implementations, a system repeats the steps to recognize multiple anaphoric references in a single expression. Additionally, in some implementations, the system identifies many types of anaphoric references in the given statement, such as “that”, “those”, “them”, “ones” (os) , “Previous”, “next”
Petition 870190054547, of 06/13/2019, p. 67/227
55/108 (next). As another illustrative example, consider the statement “Show fremont, queen anne, and ballard” (“Show fremont, queen anne and ballard”) followed by the statement “condos in those districts”. In this example, those are referring to some values (ie, fremont, queen anne, and ballard) of the neighborhood attribute, as indicated by the word “districts”.
[0117] In some implementations, references refer to values of a data attribute. In some implementations, references refer to actions that need to be performed by the system. For example, consider the statement “filter out ballard” (filter ballard) followed by “do that to fremont” (do this in fremont). Here, the word that (this) is not immediately followed by any noun, but immediately preceded by a verbal word "do" (do). In such cases, the system determines one or more actions mentioned in the previous statement, which, in this example, the “filter out” action.
[0118] In some implementations, the system supports references that are located outside the text, and in the context of the visualization. In some of these implementations, the prospective center Cf makes reference to the context within the visualization, instead of the text in the regressive center Cb. In some implementations, this form of indirect referencing includes a deictic reference that refers to an object in the environment, usually by indication. In some of these implementations, the system supports deictic references as it allows multimodal interaction (mouse + speech / text). Figure 7E shows an illustrative view in response to a deictic reference. In some implementations, this form of indirect referencing includes a visualization property reference that uses properties in the visualization, such as brand properties, text on labels, geometric axes and titles. Figure 7F shows an example of a visualization in response to a reference to a visualization property.
Petition 870190054547, of 06/13/2019, p. 68/227
56/108 [0119] Figure 8A is a diagram illustrating the application of pragmatic principles for utterances with conjunctions according to some implementations. Conjunctions in utterances communicate a range of relationships between sentence fragments called sets In a conversation, users tend to iteratively construct a query composed of adding multiple sets, generally avoiding the explicit use of conjunctions and connectors, such as "and", "Or" and "beyond" between phrases. Figure 8A illustrates how a statement “houses in Ballard under 600k last summer” (houses in Ballard less than 600 thousand last summer) is understood in the context of the previous statement “houses in Ballard” (houses in Ballard). When the system receives a Un (800) statement, which in this example is the “houses in Ballard” statement, the system calculates prospective centers and initializes regressive centers (for the LUi statement) in step 802 , using the methodology described above with reference to Figure 5 (step 502), according to some implementations. For the example statement, the system calculates the set of conversation centers as {houses, ballard} (houses, ballard). In some implementations, the system applies (806) filters to the data set based on the set of conversation centers (804), and displays a data view (808). In this example, the system applies categorical and spatial filters showing houses in Ballard.
[0120] When the system receives a statement L / _n + i (810), which, in this example, is the statement “houses in Ballard under 600k last summer” (houses in Ballard less than 600 thousand last summer), the system calculates (812) the sentence structure for the LUi enunciation (810) using techniques described above with reference to step 522 (Figure 5), according to some implementations. As mentioned above, a compound query consists of multiple sets (sometimes implicit) between the constituent phrases. Based on the sentence structure (814), the system identifies (816) sets in the statement L / _n + i (810), according to some
Petition 870190054547, of 06/13/2019, p. 69/227
57/108 implementations. In this example, the system identifies multiple queries (818), that is, “houses”, “in Ballard” (in Ballard), “under 600k” (less than 600 thousand) and “last summer” (last summer) ). Based on these queries (818) and the set of regressive conversion centers (804), the system calculates (820) a set of prospective conversation centers (822) according to some implementations. For example, the system selects the corresponding context from the U _n (800) utterance, and adds the new conversation centers derived from the sets from the L / _n + i (810) utterance. The set of prospective conversation centers (822), for this example, is the set {houses, ballard, <600k, last summer} (houses, ballard, <600k, last summer). Based on this set, the system takes necessary steps to refine the current view in step 824, according to some implementations. For this example, the system applies a numeric filter to the house price and a time filter to show last summer, to show the visualization in 826. Figure 8B shows an enlarged version of visualization 826. As shown in Figure 8B, the statement “Houses in Ballard under 600k last summer” (houses in Ballard less than 600 thousand last summer) results in the system illustrating a visualization of “Past Home Sales - Seattle” in Ballard (828), with LAST_SALE_PRICE (830), below 600K (832), last summer. For this example, the system additionally resolves the period (for "last summer") based on the previous view, such as the 2015-6 to 2015-8-31 (834) time period, according to some implementations.
[0121] Figure 8C illustrates how a system iteratively connects the analytical functions of adjacent nodes in a linearization analysis tree, according to some implementations. Finding implicit data coherence between sets is sometimes a challenging task. In the example shown in Figure 8B, all sets refer to the same entity “houses In Ballard”.
Petition 870190054547, of 06/13/2019, p. 70/227
58/108
Ballard). However, there are cases where sets map to different entities. An example of enunciation is “houses in Ballard under 600k condos in South Lake Union” (houses in Ballard less than 600 thousand condos in South Lake Union ”). The system determines whether the individual sets resolve to the same entity or to different entities, according to some implementations. In some of such implementations, the system employs a rule-based technique that takes a potentially long enunciation with possibly implicit conjunctions, translating the enunciation into a set of analytical functions chained together by logical operators. The system then performs these analytical functions in response to the user's statement, according to some implementations.
[0122] In some implementations, the system solves multiple sets within compound utterances to invoke one or more corresponding analytical functions through a linearization process. In some of such implementations, an analytical function F (X, op, v) consists of a variable X (for example, an attribute), an op operator, and a value v. Each attribute is categorical or ordered. The type of data ordered is further categorized into ordinal and quantitative. The linearization process considers the types of attributes and operators to combine analytical functions using logical operators (that is, δ, v), as described below.
[0123] Applying the v operator: When two or more adjacent sets share an attribute and if the data type of that attribute is categorical, then the system connects these sets by v, according to some implementations. Similarly, if this shared attribute is ordered and the function operator is ==, the system applies v, according to some implementations. In such cases, v is logically more appropriate as a choice, as applying δ would not correspond to any item in the data table. For example, if the enunciation
Petition 870190054547, of 06/13/2019, p. 71/227
59/108 for “show me condos and townhomes”, then the system generates the following combination of analytical functions: (F_CAT (homeType, ==, condo) v F_CAT (homeType, == , townhome)), according to some implementations. In this example, both “condo” and “town-home” belong to the same categorical attribute, that is, homeType. Since a particular house (item) cannot be both a "condo" (townhouse) and a "townhome" (house in the city) at the same time, applying the v operator is logically more appropriate than applying the a operator. Similarly, if the user issues “2 3 bedroom houses”, the system generates (F_ORDINAL (bed, ==, 2) v F ORDINAL (bed, ==, 3)), according to some implementations. The v operator is also appropriate if the type of attribute is ordered and involves the condition X <vi and X> v2, where vi <v2. For example, if the statement is “before 2013 and after 2014” (before 2013 and after 2014), then the v operator will be used between the two sets, according to some implementations. Again, in this case, applying the δ operator would result in no match for items in the data table.
[0124] Applying the δ operator: The a operator is appropriate if the attribute type is ordered and involves the condition X <vi and X> v2, where vi <v2. For example, “houses over 400k and under 700k” (houses over 400 thousand and below 700 thousand) solve for (F NUMERIC (price,>, 4000000) to F NUMERIC (price, <, 700000)). “Beds between 2 to 4” (Rooms between 2 to 4) resolves to (F ORDINAL (beds,> =, 2) to F NUMERIC (beds, <=, 4)). Note that applying the v operator would result in matching all items in the data table. In some implementations, the a operator is also applied when there is no common attribute between two sets. For example, the statement “price under 600k with 2 beds” (price below 600 thousand with 2 bedrooms ”resolves for (F ORDINAL (beds, ==, 2) to F NUMERIC (price, <=, 600000)).
[0125] In order to generate the representation of the analytical function of the enunciation
Petition 870190054547, of 06/13/2019, p. 72/227
60/108, the system goes through a corresponding analysis tree for the statement generated by a post-order analyzer (for example, analyzer described above in reference to Figure 5) and apply the two rules above iteratively in the sentences as shown in Figure 8C. For the examples illustrated in Figure 8C, the system uses the phrase “condos under 600K townhomes under 1M” (condominiums below 600 thousand houses in the city below 1M ”) as an entry, and iteratively applies the rules above to generate the function chain analytical.
[0126] Figure 9A is a diagram illustrating the application of pragmatic principles to manage with lexical cohesion according to some implementations. The three previous types of pragmatics - ellipse, referencing and conjunction, provide grammatical cohesion to the conversation. In addition to these grammatical constructs, users generally find ways of expressing concepts through the meanings of related words, that is, senses in conversation, a term called lexical cohesion. These meanings of words can be as simple as variations in spelling, radical and plurality (for example, "profit" and "profits"), synonyms (for example, "country" and "nation"), up to related terms that occur together (for example, "violence" and "crime"). Generally, the meanings of words are related to each other within a semantic context.
[0127] Figure 9A illustrates how a statement “the cheapest” is understood in the context of the previous statement “most expensive houses in Queen Anne”. When the system receives a U _n (900) enunciation, which in this example is the “most expensive houses in Queen Anne” enunciation, the system calculates prospective centers and initializes regressive centers (for enunciation LUi) in step 902, using the methodology described above with reference to Figure 5 (step 502), according to some implementations. For the example statement, the system calculates the set of conversation centers as {most expensive, houses,
Petition 870190054547, of 06/13/2019, p. 73/227
61/108
Queen Anne} (more expensive, houses, Queen Anne). In some implementations, the system maps (906) one or more conversation centers to a corresponding analytical function to generate a data visualization (908), according to some implementations. In the illustrated example, the system maps “most expensive” to the analytical function TOP_N (sale_price) of “houses”. Some implementations also note the price range for clarity. In this example, the system applies categorical and spatial filters showing houses in Ballard. Figure 9B shows an enlarged version of view 908. As shown in Figure 9B, the statement “most expensive houses in Queen Anne” (the most expensive houses in Queen Anne) results in the system showing a view of “Past Home Sales - Seattle” ( Past Home Sales - Seattle) comprising the top 10% of LAST_SALE_PRICE (928) in Queen Anne (930), according to some implementations.
[0128] When the system receives an L / _n + i (910) statement, which in this example is the “the cheapest” statement, the system calculates (912) the sentence structure for the LUi statement (914) using the techniques described above with reference to step 522 (Figure 5), according to some implementations. As mentioned above, a user's utterance sometimes has meanings of words that are better understood in the context of previous utterances. Based on the sentence structure (914), the system identifies (916) candidates for lexical cohesion in the statement L / _{n +} i (910), according to some implementations. In this example, the system identifies a “cheapest” candidate (918) for the cohesion analysis. Based on the one or more identified cohesion candidates (918) and the set of regressive conversion centers (904), the system calculates (920) a set of prospective conversation centers (922) according to some implementations. For the illustrated example, the system calculates the semantically related data attribute (for example, sale_price) corresponding to
Petition 870190054547, of 06/13/2019, p. 74/227
62/108 lexical cohesion candidates (for example, "more expensive" or "cheaper"), replacing relevant numerical attributes, while continuing the rest of the context from the U _n (904) enunciation, according to some implementations.
[0129] In some implementations, the system identifies meanings of attribute words using the Word2vec® model containing vector representations learned from large text corpus, calculating word vectors using a recurrent neural network. In some implementations, the semantic kinship S _re i between a word w / in a given statement and a data attribute dj, is the maximum value of a score calculated as follows:
cos (v _w ·, -, ΐγ /, j + (1 - λ) - ~ ~ ~ (í)
In formula (1), dist (S /, _m , Sj, _n ) is the Wu-Palmer distance between the two directions Si, m, Sj, n. V ^v dj are the vector representations of wi and this dj, respectively, λ is a weighting factor applied to a cosine distance in pairs between the vectors.
[0130] The Word2vec® template is used here as an example only. A number of other models of neural networks can be used to identify word meanings, such as Stanford University's GloVe®. Some libraries, such as GenSim® and Deeplearning4j®, offer a choice of vector representation models for different words in a single package.
[0131] In some implementations, the system not only calculates the semantic kinship between terms and data attributes, but also calculates the type of analytic function associated with each term. For example, the system performs the additional steps for “show me the cheapest houses near Ballard” queries (“show me the cheapest houses near Ballard) or“ where are the mansions in South Lake Union (where are the mansions at South Lake Union ). The system considers the corresponding dictionary definitions as additional aspects for these word vectors, and checks whether the definitions contain adjectives
Petition 870190054547, of 06/13/2019, p. 75/227
63/108 quantitative such as “less”, “more”, “low”, “high” using a POS labeler, according to some implementations. The system then maps appropriate analytical functions to these adjectives, according to some implementations. Figure 9B illustrates an example where the expression “most expensive” is mapped to the first N (selling price). Figure 9C illustrates another example in which the term “cheapest” is mapped to the last N (selling price). Figure 9D, similarly, illustrates a visualization (940) in which the term “deadliest” (most deadly) is mapped to the first N values of the “fatalities” attribute (942).
[0132] Referring again to Figure 9A, for the running example, the system calculates the set of prospective conversation centers (922) as {the cheapest, houses, Queen Anne} (the cheapest, houses, Queen Anne ), according to some implementations. Based on this set, the system takes necessary steps to refine the current view in step 924 to generate an update view (926), according to some implementations. For this example, the system maps “cheapest” (cheapest) to the last N (sale_price), and refines the current view by applying a numeric filter to the house price. Figure 9C shows an enlarged version of view 926. As shown in Figure 9C, the statement “the cheapest” results in the system showing a view of “Past Home Sales Seattle” (Past Home Sales - Seattle) ( 926), with the last 10% of LAST_SALE_PRICE (932) in Queen Anne (930), according to some implementations.
[0133] Figure 10A is a diagram illustrating the application of pragmatic principles for repair statements according to some implementations. In the course of a conversation, it is common for users to correct or clarify a previous statement. In some implementations, the system supports the use of accompanying repair statements to modify or “repair” a
Petition 870190054547, of 06/13/2019, p. 76/227
64/108 potentially ambiguous statement or modify the default behavior of how the results are presented to the user. For example, to update the default behavior of the system, such as highlighting for selection, a user can use statements such as “no, filter instead” (no, filter instead). As another example, to update data attributes, a user can use statements like “get rid of condo” (get rid of condos) or “change from condo to townhomes”, as shown in Figure 7.
[0134] Figure 10A illustrates how a “remove condos” statement is understood in the context of the previous “houses in green lake” statement. When the system receives a U _n (1000) statement, which in this example is the “houses in green lake” statement, the system calculates prospective centers and initializes regressive centers (for the L / _n + i) in step 1002, using the methodology described above with reference to Figure 5 (step 502), according to some implementations. For the example statement, the system calculates the set of conversation centers as {houses, green lake} (houses, green lake). The system generates or updates (1006) a data view (1008) based on the calculated set of conversation centers (1004), according to some implementations. In some implementations, the system applies filters to the data set based on the set of conversation centers (1004), and displays a data view (1008). In this example, the system applies categorical and spatial filters showing houses on Green Lake. Figure 10B shows an enlarged version of view 1008. As illustrated in Figure 10B, the statement “houses in green lake” results in the system showing a view of “Past Home Sales - Seattle” Previous Houses - Seattle ”) in Green Lake (1028).
[0135] When the system receives an L / _n + i (1010) statement, which, in this
Petition 870190054547, of 06/13/2019, p. 77/227
65/108 example, is the statement “remove condos” (remove condos), the system calculates (1012) the sentence structure for the statement L / _{n +} i (1010) using the techniques described above with reference to step 522 (Figure 5 ), according to some implementations. As mentioned above, a repair statement corrects or clarifies an earlier statement. Based on the sentence structure (1014), the system identifies (1016) the statement L / _n + i (1010) as being a repair statement and then identifies the relevant repair terms (1018) within the statement, according to some implementations. In this example, the system identifies a repair term "remove" within the statement "remove condos"("removecondos"). Based on the set of one or more repair statements and repair terms (1018), the system calculates (1020) a set of prospective conversation centers (1022), according to some implementations. For example, the system identifies conversation centers and / or data attributes in the previous statement that related to one or more identified repair terms, according to some implementations. In some implementations, as illustrated in Figure 10A, the system repairs or disambiguates between the conversation centers based on the repair terms (1018). The set of prospective conversation centers (1022), for this example, is the set {houses not including condos, green lake} (not including condominiums, green lake). Based on this set, the system performs necessary steps (1024) to update the results from the previous view, according to some implementations. For this example, the system filters condominiums to show the view in 1026. Figure 10C shows an enlarged version of view 1026. As illustrated in Figure 10C, the statement “remove condos” results in the system illustrating a view of “ Past Home Sales - Seattle ”(Previous Home Sales - Seattle”) in Green Lake (1028), filtering HOME_TYPE == Condo / Coop (1030).
Petition 870190054547, of 06/13/2019, p. 78/227
66/108 [0136] Figure 11A is a diagram illustrating the application of pragmatic principles to manage responses and feedback according to some implementations. A general framework for applying the principles of pragmatics to visual analytics is first described here to provide context. The framework uses and extends a model generally used for discourse structure called conversational centralization, according to some implementations. In this model, the utterances are divided into constituent speech segments, incorporating relations that can be valid between two segments. A center refers to entities serving to link this utterance to other statements in the discourse. Consider a DS speech segment with L / i utterances, Um. Each U _n (1 <n <m) utterance in DS is assigned to a set of prospective centers, Cf (U _n , DS) referring to the current focus of the conversation; each statement other than the initial statement of the segment is attributed to a set of regressive centers, Cb (U _n , DS). The set of regressive centers of a new L / _n + i statement is Cb (U _n + i, DS), which is equal to the prospective centers of U _n (that is, Cf (U _n , DS)). In the context of visual analytic conversions, prospective and regressive centers include data and value attributes, visual properties and analytical actions (for example, filtering, highlighting).
[0137] Given a U _n statement, a system implementing this structure responds by executing a series of analytical functions derived from the prospective centers Cf (U _n , DS). An analytical function F (X, op, v) consists of a variable X (which can be an attribute or a visualization property), an op operator, and a value v (usually a constant). For example, when the user says “measles in the uk”, the system creates two functions, namely, F_CAT (diseases, ==, measles) and F_CAT (country, ==, uk). When the user provides a new L / n + i statement, the system first creates a set of temporary Ctemp centers (Un + i, DS) from L / _{n +} i without considering any previous context. The system
Petition 870190054547, of 06/13/2019, p. 79/227
67/108 then applies a set of rules to create a set of prospective centers, Cf (Un + i, DS) based on some operations defined between Cb (U _n + i, DS) and Ctemp (Un + i, DS) . The prospective centers are then used to respond to the user's statement according to some implementations.
[0138] To support a conversation, the views presented by the system provide cohesive and relevant responses to various statements. Sometimes, the system responds by changing the visual coding of existing visualizations, while in other cases, the system creates a new graphic to support the visual analytical conversation more effectively. In addition to the appropriate visualization responses, the system helps the user to understand how the system interpreted an utterance by producing appropriate feedback, and allows the user to rectify the interpretation through some interface controls, as needed. On a traditional dashboard, users interact by selecting items or attributes in a view that are highlighted to provide immediate visual feedback. At the same time, other graphics are updated by highlighting or filtering the items. In a natural language interface, however, instead of making an explicit selection by mouse / keyboard, the user mentions different attributes and values, making it a non-trivial task of deciding how each view within an instrument panel should respond to the enunciation. Another complication arises when the system needs to support multiple views.
[0139] Figure 11A shows a methodology for generating responses according to some implementations. To decide how views (V) on an instrument panel should respond to the statement, a system according to some implementations proceeds as follows. The system calculates a set of prospective conversation centers (1100) Q (L / _{n +} i) corresponding to the statement Un + based on the conversation centers of a previous statement U _n + ie in
Petition 870190054547, of 06/13/2019, p. 80/227
68/108 a set of temporary conversation centers calculated using only the current context / enunciation. The system creates (1102) a list of all data attributes (1104). The system then determines (1106), for example, by invoking a view manager, whether any of the existing views encodes a respective attribute in the list of data attributes (1104). The system then determines (1118), whether a visualization (sometimes called a view here) directly encodes (for example, without using any aggregate functions, such as counting or averaging) the respective attribute as its dimensions (for example, as a aspect of the visualization). If it is found that the attribute is encoded by an existing visualization V (that is, the condition verified in 1118 is true / yes), the system highlights (1122) marks related to criteria corresponding to the respective attribute for an updated instrument panel (1124 ). If the system, on the other hand, determines that a selected view (illustrated as the V view) does not directly encode the respective attribute, the system filters (1120) the results that do not match criteria for an updated instrument panel (1124), according to some implementations. This is typically the case when a secondary chart applies additional data transformations to the result set (for example, using a line chart or bar chart). In some of such implementations, the system additionally highlights one or more results that match criteria corresponding to the respective attribute for the updated instrument panel (1124).
[0140] Figure 11B is an example of visualization (1170) that additionally illustrates the methodology presented in Figure 11 A, according to some implementations. The system highlights the items that match the “measles in the uk” criteria (measles in the United Kingdom) on the map graph (1172). The system also highlights the series (1176) on the line graph (1174), and highlights the bar (1180) on the bar graph (1178) representing “measles” (measles). However, the graph
Petition 870190054547, of 06/13/2019, p. 81/227
69/108 bars (1182) on impact cannot highlight any marks, as it does not encode any attributes in the attribute list (for example, list {Xi, X2, Xm} (1104)). Therefore, the system filters results that do not meet the “measles in the uk” criteria (measles in the United Kingdom) and updates the graph accordingly. Note that users can change the default behavior by explicitly expressing their choice of whether to filter vs. highlight (for example, “delete”, ’’ remove ”,“ just filter ”).
[0141] During the visual analysis flow, there may be situations in which the existing visualization cannot meet the user's growing information needs. This scenario could arise, for example, when a specific data attribute cannot be coded effectively in the existing visualization (for example, time values on a map), justifying the need to create a new visualization in response. Taking inspiration from a work that connects visualization with language specification, the system supports the creation of different types of visualizations (for example, bar graph, line graph, map graph and scatter graph), according to some implementations.
[0142] Figure 11C shows how an instrument panel is built progressively based on the input statements. The system generates the 1140 visualization in response to the statement “average price by neighborhood”. When the user provides a next “average price over time” statement, the system responds by generating a line graph (1152) in the view (1150) that shows the time progression of the average price. Now, if the user then provides a “by home type” statement, the system improves the line graph (1152) with lines (1162) corresponding to the different types of houses in the view (1160) .
[0143] Referring again to Figure 11 A, the underlying algorithm for
Petition 870190054547, of 06/13/2019, p. 82/227
70/108 creating or changing an existing view works as follows. First, the system determines (1106) whether creating a new view or changing an existing one is necessary. The system analyzes the attributes specified in the prospective centers Cf (U _n + i) (1100), and searches for any current visualization that encodes these data properties. If there is no match with the specification of the existing views, illustrated as the “No” arrow next to decision block 1106, the system generates a new corresponding specification consisting of attributes and types of aggregation. In Figure 11A, this is illustrated by step 1108, which decides the type of graph (for example, a bar graph, a graph and maps, or a scatter graph) using an algorithm, according to some implementations. In some of such implementations, the system employs an automatic presentation algorithm to decide the type of graph generated based on this specification. In some of these implementations, the system uses a simplified version of the automatic presentation algorithm described in Show me: Automatic presentation for visual analysis, by J. Mackinlay, P. Hanrahan, and C. Stole, which is incorporated with the present for reference purposes. Once the type of graph (1110) is decided, the system generates (1112) the type of graph to obtain a generated graph (1114). The system then positions the new graph (1114), according to some implementations. In some of these implementations, the system uses a layout algorithm based on a two-dimensional grid, automatically coordinating the presentation of the new graph (1114) with other views of the visualization. The updated instrument panel (1124) responds to the subsequent statements through actions such as highlighting or filtering.
[0144] It is further noted that, although not shown in Figure 11 A, the system repeats at least steps 1106, 1118, 1120, 1122, 1108, 1112 and 1116 for each data attribute in the list of data attributes (1104 ), according to some implementations.
Petition 870190054547, of 06/13/2019, p. 83/227
71/108 [0145] Figure 12A illustrates a set of widgets generated to manage ambiguity in a user's query according to some implementations. A challenge for natural language comprehension systems that support interactive dialogue is to determine the intent of the utterance. In some implementations, the system automatically resolves various forms of syntactic, lexical and semantic ambiguities. These resolutions are expressed in the form of widgets and feedback to help the user understand the intention of the system and the origin of how the statement was interpreted. When manipulating these widgets and visualizing the feedback of which results are illustrated in the visualization, the user can, for example, instantiate an accompanying repair statement to override or clarify the system decisions made.
[0146] In some implementations, the system identifies one or more widgets from the analytical functions derived from an enunciation. In some of these implementations, the system organizes and presents the widgets in an intuitive way so that the user can understand how the system interprets its statement and subsequently modify the interpretation using those widgets. To that end, the system assumes the original statement and orders the widgets in the same sequence as the corresponding query terms. In some of these implementations, the system is able to do this using a library, such as Sparklificator®, which facilitates the placement and visualization on a small word scale within the text in a compact way. In addition, some implementations offer a set of interfaces to users, including the ability to manipulate and / or remove a widget, modify the query, and resolve ambiguous queries.
[0147] Figure 12A shows how the system presents the widgets for the statement “condo near Ballard under 1.2M” (“condo near Ballard below 1.2M”, according to some implementations. In this example, the first term
Petition 870190054547, of 06/13/2019, p. 84/227
72/108 “condo” (condominium) was resolved for the widget representing the criterion “HOME_TYPE equals Condo / coop” (1202). Then, the second widget transmits the diffuse distance represented by “near Ballard” (1204). Since “under 1.2M” does not explicitly mention any attributes, the system determines whether the value 1200000 is within the range of minimum and maximum values for any numeric attribute in the data. If such an attribute exists (LAST_SALE_PRICE in this case), the system communicates this to the user (via widget 1206), and then allows him to change the attribute using the drop-down menu (1208).
[0148] In addition to dealing with ambiguity, in some implementations, the system also offers feedback and useful tips for modifying the text, when the system fails to fully understand the query. For example, if the system is unable to successfully analyze the given statement, the system first attempts to automatically correct misspelled terms by comparing symbols with related attributes, cell values and keywords in the current data set using fuzzy string correlation. . When the user forms a query that is partially recognized, the system prunes the unrecognized terms from the corresponding analysis tree, and then shows the results based on the symbols that are understood. Figure 12B shows different examples of situations and the corresponding feedback generated by the system, according to some implementations.
[0149] Figures 13A to 13J show a flowchart illustrating a 1300 method for using (1302) natural language for visual analysis of a data set applying pragmatic principles. The steps of method 1300 can be performed by a computer (for example, a computing device 200). In some implementations, the computer includes (1304) a display medium, one or more processors, and a memory. Figures 13A through 13J correspond to instructions stored in a computer memory or medium
Petition 870190054547, of 06/13/2019, p. 85/227
73/108 of computer-readable storage (for example, memory 206 of computing device 200). The memory stores (1306) one or more programs configured to run by one or more processors (for example, the processor (s) 202). For example, method 1300 operations are performed, at least in part, by a data visualization generation module 234 and / or a language processing module 238.
[0150] In some implementations, the device displays (1308) a data view based on a data set retrieved from a database using a first set of one or more queries. For example, referring to Figure 1, a user can associate one or more data fields from a schema information region 110 with one or more shelves (for example, column shelf 120 and row shelf 122 , Figure 1) in the data visualization region 112. In response to receiving user associations, in some implementations, the computer retrieves data for the data fields from the data set using a set of one or more queries, and then displays a data view (for example, data view 408) in the data view region 112 that corresponds to user input received. The display of data visualizations is discussed in more detail earlier with reference to Figure 1.
[0151] The computer receives (1310) a first user input to specify a first natural language command related to the displayed data visualization. In some implementations, user input is received as text input (for example, via the keyboard 216 or via the touch screen 214) from a user in the data entry region on the proximity display to the displayed data view. In some implementations, user input is received as a voice command using a microphone (for example, an audio input device 220) coupled to the
Petition 870190054547, of 06/13/2019, p. 86/227
74/108 computer. For example, referring to Figure 4A, the displayed data view 408 concerns measles in the United Kingdom. Receiving entries (for example, commands / queries) from a user is discussed in more detail above with reference to Figure 1.
[0152] Based on the displayed data visualization, the computer extracts (1312) a first set of one or more independent analytical phrases from the first command in natural language. For example, referring to Figure 4A, the first natural language command received by the computer reads “measles in the uk” (measles in the United Kingdom). The data visualization displayed before receiving the first natural language command concerns disease epidemics worldwide. In some implementations, the computer extracts "measles" (measles) and "in the uk" (in the United Kingdom) from the first natural language command, as these analytical phrases relate to the displayed data visualization. When the phrases have direct reference to the data fields in the displayed data view, the extraction (1312) is direct: it collects all the phrases that are direct references to the data fields. In some implementations, the computer separates the stem or removes “stopwords (irrelevant words), padding words, or any set of words from the received query, and extracts (1312) all other phrases from the first natural language command, since may be related to the displayed data view. Some implementations use this approach when the phrases in the natural language command have some indirect reference to the data fields in the displayed view.
[0153] The language processing module 238 calculates (1314) a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases, according to some implementations. A structure based on a conversational interaction model is described above with reference to Figures 3A,
Petition 870190054547, of 06/13/2019, p. 87/227
75/108 and 11. A center refers to the entities that serve to link this enunciation (sometimes called the natural language command) to other enunciations in a speech (a series of enunciations). Conversation centers include data and value attributes, visual properties and analytical actions. Calculating the conversation centers based on the analytical phrases includes mapping the analytical phrases to one or more conversation centers after necessary transformations and analyzes. For the measles in the uk (measles in the UK) enunciation example, language processing module 238 processes the phrase measles (measles) and analyzes the phrase to infer that it refers to the data attribute “DISEASE” (disease) , as illustrated in Figure 4D described above.
[0154] Subsequently, the language processing module 238 calculates (1316) a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases, according to some implementations. As described above with reference to Figures 3A, 5 and 11, each of the analytical functions consists of a variable, an operator and a value, according to some implementations. In some implementations, for the example of “measles in the uk” (measles in the United Kingdom), the language processing module 238 creates two functions, namely, F_CAT (diseases, ==, measles) and F_CAT (country, ==, uk). In some implementations, as another example, for a statement “condos under 600K” (condos below 600 thousand), the language processing module 238 creates two functions F_CAT (homeType, == condo) and F_NUMERIC (price, <, 600000 ). In both of these examples, the language processing module 238 searches for one or more attributes related to the displayed data view that corresponds to the first set of one or more conversation centers to identify a first
Petition 870190054547, of 06/13/2019, p. 88/227
76/108 set of data attributes, according to some implementations. The language processing module 238 also identifies, when examining the first set of one or more conversation centers, a first set of operators (for example, operator ==, operator <) and a first set of values corresponding to the first set of operators data attributes, according to some implementations. With the first set of variables (attributes), and the first corresponding set of operators and first set of values, the language processing module 238 constructs the first set of one or more analytical functions, thus creating the first set of one or more functional phrases.
[0155] In some implementations, the computer updates (1318) the data visualization based on a first set of one or more functional phrases calculated in step 1316. As shown in Figure 131, in some implementations, the computer queries again (1372 ) the database using a second set of one or more queries based on the first set of one or more functional phrases, thereby retrieving a second set of data. In some cases, the database query is repeated locally on the computing device using data stored or cached on the computing device. For example, query repetition is usually performed locally when the natural language command specifies one or more filters. In some of these implementations, the computer updates (1374) the data visualization based on the second data set. In some implementations, the computer additionally creates and displays (1376) a new data view (for example, without updating one or more existing data views) using the second data set.
[0156] Referring now to Figure 13B, the computer receives (1320) a second user input to specify a second command
Petition 870190054547, of 06/13/2019, p. 89/227
77/108 natural language related to the displayed data visualization. In some implementations, user input is received as text input (for example, via the keyboard 216 or via the touch screen 214) from a user in the data entry region on the proximity display to the displayed data view. In some implementations, user input is received as a voice command using a microphone (for example, an audio input device 220) attached to the computer. For example, referring to Figure 4A, the displayed data view 408 refers to measles in the United Kingdom, when the computer receives the second input from the user “show me the orange spike” (show me the orange peak). Receiving entries (for example, commands / queries) from a user is discussed in more detail above with reference to Figure 1.
[0157] Based on the displayed data visualization, the computer extracts (1322) a second set of one or more independent analytical phrases from the second natural language command. For example, referring to Figure 4A, the second natural language command (410) received by the computer reads “show me the orange spike”. In some implementations, for this example, the computer extracts “the orange spike” from the second natural language command, as these analytical phrases relate to the displayed data visualization (which refers to measles in the United Kingdom, and has an orange peak, a display property). When phrases have direct reference to data fields in the displayed data view, extraction (1322) is straightforward: it collects all phrases that are direct references to data fields. In some implementations, the computer separates the stem or removes “stopwords (irrelevant words), padding words, or any set of words from the query received, and extracts (1322) all other phrases from the second natural language command, since may be related to
Petition 870190054547, of 06/13/2019, p. 90/227
78/108 data visualization displayed. Some implementations use this approach when the phrases in the natural language command have some indirect reference to the data fields in the displayed view.
[0158] The language processing module (1324) calculates a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases, according to some implementations.
[0159] The language processing module (1326) derives a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules, according to some implementations. In some of such implementations (1332), each of the conversation centers in the first set of one or more conversation centers, the temporary set of one or more conversation centers, and the second set of one or more conversation centers comprises a value for a variable (for example, a data attribute or a visualization property). In some of such implementations, the language processing module uses the transition rules by performing a sequence of operations (as shown in Figure 13C) comprising: determining (1334) whether a first variable is included in the first set of one or more centers of conversation; determine (1336) whether the first variable is included in the temporary set of one or more conversation centers; determine (1338) a respective transition rule of one or more transition rules to be applied based on whether the first variable is included in the first set of one or more conversation centers and / or in the temporary set of one or more conversation centers conversation; and apply (1339) the respective transition rule.
[0160] In some implementations, as shown in Figure 13D, one or more
Petition 870190054547, of 06/13/2019, p. 91/227
79/108 more transition rules used by the language processing module 238 comprise (1340) comprise a CONTINUE rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and adding one or more conversation centers from the temporary set of one or more conversation centers to the second set of one or more conversation centers. In some of such implementations, applying (1342) the respective transition rule comprises: according to a determination that (i) the first variable is included in the temporary set of one or more conversation centers, and (ii) that the first variable is not included in the first set of one or more conversation centers, apply (1344) the CONTINUE rule to include the first variable in the second set of one or more conversation centers.
[0161] In some implementations, as illustrated in Figure 13E, the one or more transition rules used by the language processing module 238 comprise (1346) comprise a RETER rule to retain each conversation center in the first set of one or more centers in the second set of one or more conversation centers without adding any conversation center from the temporary set of one or more conversation centers to the second set of one or more conversation centers. In some of such implementations, applying (1348) the respective transition rule comprises: according to a determination that (i) the first variable is included in the first set of one or more conversation centers, and (ii) that the first variable is not included in the temporary set of one or more conversation centers, apply (1350) the RETER rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers.
[0162] In some implementations, as shown in Figure 13F, one or more
Petition 870190054547, of 06/13/2019, p. 92/227
80/108 plus transition rules used by the language processing module 238 comprise (1352) a SHIFT rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and replace one or more chat centers in the second set of one or more chat centers with chat centers in the temporary set of one or more chat centers. In some of these implementations, applying (1354) the respective transition rule comprises: according to a determination (1356) that (i) the first variable is included in the temporary set of one or more conversation centers, and (ii) that the first variable is not included in the first set of one or more conversation centers, the language processing module 228 performs a sequence of operations to: determine (1358) whether the value of the first variable in the first set of one or more conversation centers is different from the value of the first variable in the temporary set of one or more conversation centers; and, in accordance with a determination that the values of the first variable are different, apply (1360) the OFFSET rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers , and replace the value for the first variable in the second set of one or more conversation centers with the value for the first variable in the temporary set of one or more conversation centers. In some of such implementations, applying (1354) the respective transition rule additionally comprises, as shown in Figure 13G, determining (1362) whether a widget corresponding to the first variable has been removed by the user, and according to the determination that the widget has been removed, apply (1364) the OFFSET rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and
Petition 870190054547, of 06/13/2019, p. 93/227
81/108 replace the value for the first variable in the second set of one or more conversation centers with a new value (for example, a maximum value, a superset value) that includes the value of the first variable in the first set of one or more conversation centers.
[0163] Referring now to Figure 13B, the language processing module 238 calculates (1328) a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set one or more functional phrases. The language processing module 238 performs this step, using the second set of one or more conversation centers calculated in step 1326, in a similar manner to step 1316 described above.
[0164] The Computer updates (1330) the data visualization based on the second set of one or more functional phrases, according to some implementations. In some implementations, as shown in Figure 13J, the computer repeats the query (1378) to the database using a second set of one or more queries based on the first set of one or more functional phrases, thereby retrieving a third set of Dice; and updates (1380) the data visualization based on the third data set. Additionally, in some of such implementations, the computer creates and displays (1382) a new data view (for example, without updating one or more existing data views) using the third data set.
[0165] In some implementations, as shown in Figure 13H, the computer additionally determines (1366) whether the user selected a data set other than the first data set, determines (1368) whether the user restarted the data visualization; and, according to a determination that (i) the user has selected a different data set, or (ii) the user has restarted the data view, restarts (1370) each of the first set of one or more
Petition 870190054547, of 06/13/2019, p. 94/227
82/108 chat centers, the temporary set of one or more chat centers, and the second set of one or more chat centers for an empty set that does not include any chat centers.
[0166] Figures 14A to 14R present a flowchart illustrating a 1400 method for using (1402) natural language for visual analysis of a data set applying pragmatic principles, including to manage various forms of pragmatics, according to some implementations. The steps of method 1400 can be performed by a computer (for example, a computing device 200). In some implementations, the computer includes (1404) a display medium, one or more processors, and a memory. Figures 14A to 14R correspond to instructions stored in a computer memory or computer-readable storage medium (for example, memory 206 of computing device 200). The memory stores (1406) one or more programs configured to run by one or more processors (for example, the processor (s) 202). For example, method 1400 operations are performed, at least in part, by a data visualization generation module 234 and / or by a language processing module 238.
[0167] In some implementations, the device displays (1408) a data view based on a data set retrieved from a database using a first set of one or more queries. For example, referring to Figure 1, a user can associate one or more data fields from a schema information region 110 with one or more shelves (for example, column shelf 120 and row shelf 122 , Figure 1) in the data visualization region 112. In response to receiving user associations, in some implementations, the computer retrieves data for the data fields from the data set using a set of one or more queries, and then displays a data view (for example, the
Petition 870190054547, of 06/13/2019, p. 95/227
83/108 data view 408) in data view region 112 that corresponds to user input received. The display of data visualizations is discussed in more detail earlier with reference to Figure 1.
[0168] The computer receives (1410) a first user input to specify a first natural language command related to the displayed data visualization. In some implementations, user input is received as text input (for example, via the keyboard 216 or via the touch screen 214) from a user in the data entry region on the proximity display to the displayed data view. In some implementations, user input is received as a voice command using a microphone (for example, an audio input device 220) attached to the computer. For example, referring to Figure 6A, the displayed data view 608 refers to houses less than 1M in Ballard. Receiving entries (for example, commands / queries) from a user is discussed in more detail above with reference to Figure 1.
[0169] Based on the displayed data visualization, the computer extracts (1412) a first set of one or more independent analytical phrases from the first command in natural language. For example, referring to Figure 6A, the first natural language command received by the computer reads “houses less than 1M in Ballard”. The data visualization displayed before receiving the first natural language command relates to sales of previous homes in Seattle. In some implementations, the computer extracts "houses" and "less than 1M" (less than 1M) and "in Ballard" (in Ballard) from the first natural language command, as these analytical phrases relate to the visualization of displayed data. When the phrases have direct reference to the data fields in the displayed data view, the extraction (1412) is direct: it collects all the phrases that are direct references to the data fields. In
Petition 870190054547, of 06/13/2019, p. 96/227
84/108 some implementations, the computer separates the stem or removes “stopwords (irrelevant words), padding words, or any set of words from the received query, and extracts (1412) all other phrases from the first natural language command, a since they may be related to the displayed data view. Some implementations use this approach when the phrases in the natural language command have some indirect reference to the data fields in the displayed view.
[0170] The language processing module 238 calculates (1414) a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases, according to some implementations. A structure based on a model of conversational interaction is described above with reference to Figures 3A, 5 and 11. A center refers to the entities that serve to link this enunciation (sometimes called the natural language command) to other enunciations in a speech (a series of statements) Conversation centers include attributes of data and values, visual properties and analytical actions. Calculating the conversation centers based on the analytical phrases includes mapping the analytical phrases to one or more conversation centers after necessary transformations and analyzes. For the enunciation example houses less than 1M in Ballard, the language processing module 238 processes the phrase less than 1M (less than 1M) and analyzes the sentence to infer that it refers to the LAST_SALE_PRICE data attribute, as illustrated in Figure 6C described above.
[0171] Subsequently, the language processing module 238 calculates (1416) a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first set of one or more functional phrases, according to some
Petition 870190054547, of 06/13/2019, p. 97/227
85/108 implementations. As described above with reference to Figures 3A, 5 and 11, each of the analytical functions consists of a variable, an operator and a value, according to some implementations. In some implementations, for the “houses less than 1M in Ballard” enunciation example, the language processing module 238 creates two functions, creates four functions F_CAT (homeType, == condo), F_CAT (homeType, == townhouse), F_CAT (homeType == single), and F_NUMERIC (price, <, 500000). In this example, the language processing module 238 searches for one or more attributes related to the displayed data view that corresponds to the first set of one or more conversation centers to identify a first set of data attributes, according to some implementations. The language processing module 238 also identifies, when examining the first set of one or more conversation centers, a first set of operators (for example, operator ==, operator <) and a first set of values corresponding to the first set of operators data attributes, according to some implementations. With the first set of variables (attributes), and the first corresponding set of operators and first set of values, the language processing module 238 constructs the first set of one or more analytical functions, thus creating the first set of one or more functional phrases.
[0172] In some implementations, the computer updates (1418) the data visualization based on a first set of one or more functional phrases calculated in step 1416.
[0173] Referring now to Figure 14B, the computer receives (1420) a second input from the user to specify a second natural language command related to the displayed data visualization. In some implementations, user input is received as text input (for example, through the
Petition 870190054547, of 06/13/2019, p. 98/227
86/108 keyboard 216 or via touch screen 214) from a user in the data entry region on the display medium in close proximity to the displayed data view. In some implementations, user input is received as a voice command using a microphone (for example, an audio input device 220) attached to the computer. For example, referring to Figure 6A, the displayed data visualization 608 refers to houses less than 1M in Ballard, when the computer receives the second entry from the user “townhomes” (houses in the city). Receiving entries (for example, commands / queries) from a user is discussed in more detail above with reference to Figure 1.
[0174] Based on the displayed data visualization, the computer extracts (1422) a second set of one or more independent analytical phrases from the second natural language command. For example, referring to Figure 6A, the second natural language command (620) received by the computer reads “townhomes” (houses in the city). In some implementations, for this example, the computer extracts “townhomes” (houses in the city) from the second natural language command, since this analytical phrase relates to the displayed data visualization (which concerns houses in the city in Ballard). When the phrases have direct reference to the data fields in the displayed data view, the extraction (1422) is direct: it collects all the phrases that are direct references to the data fields. In some implementations, the computer separates the stem or removes “stopwords (irrelevant words), padding words, or any set of words from the received query, and extracts (1422) all other phrases from the second natural language command, since may be related to the displayed data view. Some implementations use this approach when the phrases in the natural language command have some indirect reference to the data fields in the displayed view.
Petition 870190054547, of 06/13/2019, p. 99/227
87/108 [0175] The language processing module (1424) calculates a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases, according to some implementations.
[0176] The language processing module calculates (1426) the cohesion between the first set of one or more analytical phrases and the second set of one or more analytical phrases, and derives a second set from one or more conversation centers from the first set of one or more chat centers and the temporary set of one or more chat centers based on cohesion, according to some implementations. As Figure 14C shows, in some implementations, calculating cohesion comprises identifying (1434) a sentence structure from the second set of one or more analytical sentences. The calculation of the sentence structure is described above with reference to Figure 5 (steps 522 or 510), and through the example in Figure 6A (steps 610 and 622), according to some implementations. In some implementations, identifying the sentence structure comprises analyzing (1436) the second natural language command applying a probabilistic grammar (as explained with reference to 522 or 51, Figure 5), thereby obtaining an analyzed output. In some implementations, this step additionally comprises deducting (1438) the syntactic structure using a speech-part analysis API provided by a library of tools for natural language processing, again as described above with reference to Figure 5. In some implementations, the analyzed output is resolved (1440) by the language processing module for corresponding data and categorical attributes. For example, for the statement “townhomes” (620) in Figure 6A, the language processing module resolves the categorical attribute as being the type of house. In some implementations, although not illustrated, the language processing module
Petition 870190054547, of 06/13/2019, p. 100/227
88/108 resolves the analyzed output for corresponding data and categorical attributes after step 1442.
[0177] When the sentence structure is identified in step 1434, the language processing module identifies one or more forms of the pragmatic forms based on the sentence structure, according to some implementations. Subsequently, the language processing module (1446) derives the second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers based on one or more more forms of pragmatics identified. Figures 14D, 14E, 14H and 141 described below illustrate how different types of pragmatic forms are identified and how the second set of one or more conversation centers is identified based on the identified form of pragmatics.
[0178] In some implementations, the language processing module 238 calculates (1430) a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more phrases functional. The language processing module 238 performs this step, using the second set of one or more conversation centers calculated in step 1426, in a similar manner to step 1416 described above.
[0179] In some implementations, as shown in Figure 14P, the language processing module 238 calculates (14,186) the semantic relationship between the second set of one or more extracted analytical phrases and one or more data attributes included in the data visualization updated, and calculates (14,188) analytical functions associated with the second set of one or more analytical phrases, thereby creating the second set of one or more functional phrases, based on the semantic kinship of one or more data attributes.
Petition 870190054547, of 06/13/2019, p. 101/227
89/108
In some implementations, the language processing module 238 calculates semantically related terms for lexically cohesive expressions. The process of calculating semantic kinship and calculating analytical functions based on semantic kinship has been described above with reference to Figures 9A-9D. In some implementations, although not illustrated in Figure 14B, calculating the second set of one or more analytical functions and the second set of one or more functional phrases based on the semantic kinship of data attributes is performed in addition to (rather than as an alternative à) step 1430 described above.
[0180] In some implementations, as shown in Figure 14Q, the language processing module 238 calculates the semantic kinship by training (14,190) a first model of neural network in a large text corpus, thereby learning vector representations of words. In some of these implementations (14,192), the first neural network model comprises the Word2vec® model. In some implementations, the language processing module 238 calculates (1494) a first word vector for a first word in a first sentence in the second set of one or more analytical sentences using a second neural network model, the first vector mapping word mapping the first word to the vector representations of words learned in step 14,190. In some of these implementations (14,196), the second model of neural network comprises the model of recurrent neural network. In some implementations, the language processing module calculates (14,198) a second word vector for a first data attribute in one or more data attributes using the second neural network model, the second word vector mapping the first data for vector representations of words learned in step 14,190. Although not shown in Figure 14Q, the calculation of the first word vector and the second word vector can be
Petition 870190054547, of 06/13/2019, p. 102/227
90/108 performed in parallel by the language processing module 238. Subsequently, the language processing module calculates (14,200) the relationship between the first word vector and the second word vector using a similarity metric, according to some implementations. In some of such implementations (14,202), the similarity metric is based at least (i) on the Wu-Palmer distance between the word meanings associated with the first word vector and the second word vector, (ii) on a factor weighting, and (iii) a cosine distance in pairs between the first word vector and the second word vector.
[0181] In some implementations, as shown in Figure 14R, the language processing module 238 obtains (14,204) word definitions for the second set of one or more analytical phrases from a publicly available dictionary, determines (14,206) whether the word definitions contain one or more predefined adjectives using a morphosyntactic analysis API provided by a library of natural language processing tools, and, according to the determination that the word definitions contain one or more predefined adjectives, maps the one or more adjectives redefined for one or more analytical functions. These operations were described above with reference to Figure 9B, according to some implementations. The language processing module 238 calculates the type of analytical function for one or more terms in the second set of one or more analytical phrases. For example, the term “cheapest” is mapped to Bottom_N (sale price).
[0182] Referring now to Figure 14B, the computer updates (1432) the data visualization based on the second set of one or more functional phrases, according to some implementations.
[0183] Each of Figures 14D, 14E, 14H and 141 shows the steps taken by the language processing module 238 to deal with
Petition 870190054547, of 06/13/2019, p. 103/227
91/108 different types of pragmatic forms identified based on the sentence structure, and how the second set of one or more conversation centers is derived based on the identified form of pragmatics, according to some implementations.
[0184] Figure 14D shows the steps taken by the language processing module 238 to derive the second set of one or more conversation centers for incomplete utterance, according to some implementations. The language processing module 238 identifies (1448) the pragmatic form as being an incomplete enunciation determining whether one or more linguistic elements are absent in the sentence structure, according to some implementations. In some implementations, subsequently, the language processing module 238 derives (1450) the second set of one or more conversation centers by performing a sequence of operations (1452) which includes: determining (1454) a first subset of conversation centers in the first set of one or more conversation centers, the first subset of conversation centers corresponding to one or more linguistic elements missing from the sentence structure, and calculate (1456) the second set of one or more conversation centers combining the temporary set of one or more chat centers with the first subset of chat centers. Figure 6A described above shows an implementation that derives the second set of one or more conversation centers for incomplete utterances.
[0185] Figure 14E shows the steps taken by the language processing module 238 to derive the second set of one or more conversation centers for reference expressions, according to some implementations. The language processing module 238 identifies (1458) the pragmatic form as a reference expression determining whether one or more anaphoric references are present in the sentence structure, according to
Petition 870190054547, of 06/13/2019, p. 104/227
92/108 some implementations. In some implementations, subsequently, the language processing module 238 derives (1460) the second set of one or more conversation centers by performing a sequence of operations (1462) which includes: fetching (1464) the first set of one or more centers of conversation to find a first subset of conversation centers that corresponds to a phrasal block in the second natural language command that contains a first anaphoric reference from one or more anaphoric references, and calculate (1466) the second set of one or more centers of conversation based on the temporary set of one or more chat centers and the first subset of chat centers. Figure 7A described above shows an implementation that derives the second set of one or more conversation centers for reference expressions.
[0186] In some implementations, the language processing module 238 determines (1468) whether the first anaphoric reference is a reference to a visualization property in the updated data visualization (sometimes called a deictic reference), and, according to with a determination that the anaphoric reference is a deictic reference, calculates (1470) the second set of one or more conversation centers based on the temporary set of one or more conversation centers, and data related to the visualization property.
[0187] In some implementations, as illustrated in Figure 14F, the language processing module 238 determines (1472) whether the first anaphoric reference is accompanied by a verb in the second natural language command, and, according to a determination that the anaphoric reference is accompanied by a verb (1474), searches (1476) for the first set of one or more conversation centers to find a first action conversation center that refers to an action verb, and calculates (1478) the second set of one or more
Petition 870190054547, of 06/13/2019, p. 105/227
93/108 chat centers based on the temporary set of one or more chat centers and the first subset of chat centers.
[0188] In some implementations, as shown in Figure 14G, the language processing module 238 determines (1480) whether the first anaphoric reference is a deitic reference that refers to some object in the environment, and, according to a determination of that the anaphoric reference is a deictic reference, calculates (1482) the second set of one or more conversation centers based on the temporary set of one or more conversation centers, and a characteristic of the object.
[0189] Figure 14H shows the steps taken by the language processing module 238 to derive the second set of one or more conversation centers for repair statements, according to some implementations. The language processing module 238 identifies (1484) the pragmatic form as a repair statement by determining whether the sentence structure corresponds to one or more predefined repair statements, according to some implementations. In some implementations, subsequently, the language processing module 238 derives (1486) the second set of one or more conversation centers by performing a sequence of operations (1488) which includes: calculating (1490) the second set of one or more centers of conversation based on the temporary set of one or more conversation centers, and update (1492) one or more data attributes in the second set of one or more conversation centers based on one or more predefined repair statements and the structure of phrase. Figure 10A described above shows an implementation that derives the second set of one or more conversation centers for repair statements.
[0190] In some implementations, the language processing module 238 determines (1494) whether the sentence structure corresponds to a
Petition 870190054547, of 06/13/2019, p. 106/227
94/108 repair statement to modify a standard behavior related to the display of a data visualization, and, according to a determination that the sentence structure corresponds to a repair statement to modify a standard behavior, amends (1496) standard display-related behavior.
[0191] Figure 141 shows the steps taken by the language processing module 238 to derive the second set of one or more conversation centers for conjunctive expressions, according to some implementations. The language processing module 238 identifies (1498) the pragmatic form as a conjunctive expression determining whether the second natural language command is a conjunctive expression by (i) determining the explicit or implicit presence of conjunctions in the sentence structure, and ( ii) determine whether the temporary set of one or more chat centers includes each chat center in the first set of one or more chat centers, according to some implementations. In some implementations, subsequently, the language processing module 238 derives (14,100) the second set of one or more conversation centers by calculating (14,104) the second set of one or more conversation centers based on the temporary set of one or more conversation centers, according to the determination (14.102) that the second command of natural language is a conjunctive expression. Figure 8A described above shows an implementation that derives the second set of one or more conversation centers for utterances with conjunctions.
[0192] In some implementations, the language processing module 238 determines (14.106) whether the second natural language command has more than one set, and, according to the determination that the second natural language command has more than one set, calculates (14,108) the second set
Petition 870190054547, of 06/13/2019, p. 107/227
95/108 of one or more analytical functions linearizing the second natural language command.
[0193] In some implementations, the language processing module 238 linearizes the second natural language command by performing a sequence of operations illustrated in Figure 14J. The sequence of operations includes generating (14,110) an analysis tree for the second natural language command, traversing (14,111) the post-order analysis tree to extract a first analytical sentence and a second analytical sentence, in which the first sentence analytical and the second analytical phrase are adjacent nodes in the analysis tree, and to combine (14,114) the first analytical function with the second analytical function by applying one or more logical operators based on one or more characteristics of the first analytical function and the second analytical function, in which the one or more characteristics include attribute type, operator type and a value.
[0194] Each of Figures 14K-14O illustrates different instances of the last step (14.114) of combining the first analytical function with the second analytical function, according to some implementations. In each case (as illustrated by the labels 14,116, 14,126, 14,36, 14,150, and 14,168 in the respective figures), the first analytical function comprises a first attribute, a first operator and a first value; the second analytical function comprises a second attribute, a second operator and a second value.
[0195] In Figure 14K, the language processing module 238 combines (14,118) the first analytical function with the second analytical function performing a sequence of operations, according to some implementations. The sequence of operations includes: determining (14,120) whether the first attribute is a categorical type attribute or an ordered type attribute, and determining whether the second attribute is a categorical type attribute or an ordered type attribute, determining (14,122) if the first attribute and the second attribute are identical,
Petition 870190054547, of 06/13/2019, p. 108/227
96/108 According to a determination that the first attribute and the second attribute are identical and both are categorical attributes, apply (14.124) a union operator to combine the first analytic function and the second analytic function.
[0196] In Figure 14L, the language processing module 238 combines (14.128) the first analytical function with the second analytical function performing a sequence of operations, according to some implementations. The sequence of operations includes: determining (14,130) whether the first attribute is a categorical attribute or an ordered type attribute, and determining whether the second attribute is a categorical attribute or an ordered type attribute, determining (14,132) if the first attribute and the second attribute are identical, according to a determination that the first attribute and the second attribute are not identical, apply (14,134) the intersection operator to combine the first analytic function and the second analytic function.
[0197] In Figure 14M, the language processing module 238 combines (14,138) the first analytical function with the second analytical function performing a sequence of operations, according to some implementations. The sequence of operations includes: determining (14,140) whether the first attribute is a categorical attribute or an ordered type attribute, and determining whether the second attribute is a categorical attribute or an ordered type attribute, determining (14,142) whether the first attribute and the second attribute are identical, according to a determination that the first attribute and the second attribute are identical and both are attributes of the ordered type (14,144) ;. determine (14,146) the types of operators of the first operator and the second operator, and, according to a determination that both the first operator and the second operator are equal operators, apply (14,148) the union operator to combine the first analytical function and the second analytical function.
[0198] In Figure 14N, the language processing module 238
Petition 870190054547, of 06/13/2019, p. 109/227
97/108 combines (14,152) the first analytical function with the second analytical function performing a sequence of operations, according to some implementations. The sequence of operations includes: determining (14,154) whether the first attribute is a categorical type attribute or an ordered type attribute, and determining whether the second attribute is a categorical type attribute or an ordered type attribute, determining (14,156) if the first attribute and the second attribute are identical, and, according to a determination that the first attribute and the second attribute are identical and both are attributes of the ordered type: determine (14,160) the types of operators of the first operator and the second operator; according to a determination that the first operator is a “less than” operator and the second operator is a “greater than” operator (14,162): determine (14,164) whether the first value is less than the second value, and, according to a determination that the first value is less than the second value, apply (14,166) the join operator to combine the first analytical function and the second analytical function.
[0199] In Figure 140, the language processing module 238 combines (14,170) the first analytical function with the second analytical function performing a sequence of operations, according to some implementations. The sequence of operations includes: determining (14,172) whether the first attribute is a categorical attribute or an ordered type attribute, and determining whether the second attribute is a categorical attribute or an ordered type attribute, determining (14,174) if the first attribute and the second attribute are identical, according to a determination that the first attribute and the second attribute are identical and both are attributes of the ordered type (14,176): determine (14,178) the types of operators of the first operator and the second operator; according to a determination that the first operator is a “greater than” operator and the second operator is a “less than” operator (14,180): determine (14,182) whether the first value is less than the second value, and, according to a determination
Petition 870190054547, of 06/13/2019, p. 110/227
98/108 that the first value is less than the second value, apply (14,184) the intersection operator to combine the first analytic function and the second analytic function.
[0200] Figures 15A to 15H show a process flow chart (method 1500) that uses (1502) natural language for visual analysis of a data set applying pragmatic principles, including to manage responses and feedback, and to deal with ambiguity in a user query, according to some implementations. Method 1500 steps can be performed by a computer (for example, a computing device 200). In some implementations, the computer includes (1504) a display medium, one or more processors, and a memory. Figures 15A to 15H correspond to instructions stored in a computer memory or computer-readable storage medium (for example, memory 206 of computing device 200). The memory stores (1506) one or more programs configured to run by one or more processors (for example, the processor (s) 202). For example, method 1500 operations are performed, at least in part, by a data visualization generation module 234 and / or a language processing module 238.
[0201] In some implementations, the device displays (1508) a data view based on a data set retrieved from a database using a first set of one or more queries. For example, referring to Figure 1, a user can associate one or more data fields from a schema information region 110 with one or more shelves (for example, column shelf 120 and row shelf 122 , Figure 1) in the data visualization region 112. In response to receiving user associations, in some implementations, the computer retrieves data for the data fields from the data set using a set
Petition 870190054547, of 06/13/2019, p. 111/227
99/108 of one or more queries, and then display a data view (for example, data view 408) in the data view region 112 that corresponds to user input received. The display of data visualizations is discussed in more detail earlier with reference to Figure 1.
[0202] The computer receives (1510) a first user input to specify a first natural language command related to the displayed data visualization. In some implementations, user input is received as text input (for example, via the keyboard 216 or via the touch screen 214) from a user in the data entry region on the proximity display to the displayed data view. In some implementations, user input is received as a voice command using a microphone (for example, an audio input device 220) attached to the computer. For example, referring to Figure 6A, the displayed data view 608 refers to houses less than 1M in Ballard. Receiving entries (for example, commands / queries) from a user is discussed in more detail above with reference to Figure 1.
[0203] Based on the displayed data visualization, the computer extracts (1512) a first set of one or more independent analytical phrases from the first command in natural language. For example, referring to Figure 6A, the first natural language command received by the computer reads “houses less than 1M in Ballard”. The data visualization displayed before receiving the first natural language command relates to sales of previous homes in Seattle. In some implementations, the computer extracts "houses" and "less than 1M" (less than 1M) and "in Ballard" (in Ballard) from the first natural language command, as these analytical phrases relate to the visualization of displayed data. When the phrases have direct reference to the data fields in the displayed data view, the extraction (1512) is
Petition 870190054547, of 06/13/2019, p. 112/227
100/108 direct: collects all phrases that are direct references to the data fields. In some implementations, the computer separates the stem or removes “stopwords (irrelevant words), padding words, or any set of words from the received query, and extracts (1412) all other phrases from the first natural language command, since may be related to the displayed data view. Some implementations use this approach when the phrases in the natural language command have some indirect reference to the data fields in the displayed view.
[0204] The language processing module 238 calculates (1514) a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases, according to some implementations. A structure based on a model of conversational interaction is described above with reference to Figures 3A, 5 and 11. A center refers to the entities that serve to link this enunciation (sometimes called the natural language command) to other enunciations in a speech (a series of statements) Conversation centers include attributes of data and values, visual properties and analytical actions. Calculating the conversation centers based on the analytical phrases includes mapping the analytical phrases to one or more conversation centers after necessary transformations and analyzes. For the enunciation example houses less than 1M in Ballard, the language processing module 238 processes the phrase less than 1M (less than 1M) and analyzes the sentence to infer that it refers to the LAST_SALE_PRICE data attribute, as illustrated in Figure 6C described above.
[0205] Subsequently, the language processing module 238 calculates (1516) a first set of analytical functions associated with the first set of one or more conversation centers, thereby creating a first
Petition 870190054547, of 06/13/2019, p. 113/227
101/108 set of one or more functional phrases, according to some implementations. As described above with reference to Figures 3A, 5 and 11, each of the analytical functions consists of a variable, an operator and a value, according to some implementations. In some implementations, for the “houses less than 1M in Ballard” enunciation example, the language processing module 238 creates two functions, creates four functions F_CAT (homeType, == condo), F_CAT (homeType, == townhouse), F_CAT (homeType == single), and F_NUMERIC (price, <, 500000). In this example, the language processing module 238 searches for one or more attributes related to the displayed data view that corresponds to the first set of one or more conversation centers to identify a first set of data attributes, according to some implementations. The language processing module 238 also identifies, when examining the first set of one or more conversation centers, a first set of operators (for example, operator ==, operator <) and a first set of values corresponding to the first set of operators data attributes, according to some implementations. With the first set of variables (attributes), and the first corresponding set of operators and first set of values, the language processing module 238 constructs the first set of one or more analytical functions, thus creating the first set of one or more functional phrases.
[0206] In some implementations, the computer updates (1518) the data visualization based on a first set of one or more functional phrases calculated in step 1516.
[0207] Referring now to Figure 15B, the computer receives (1520) a second input from the user to specify a second natural language command related to the displayed data visualization. In some implementations, the
Petition 870190054547, of 06/13/2019, p. 114/227
102/108 user input is received as text input (for example, via keyboard 216 or via touch screen 214) from a user in the data entry region on the display near the view displayed. In some implementations, user input is received as a voice command using a microphone (for example, an audio input device 220) attached to the computer. For example, referring to Figure 6A, the displayed data visualization 608 refers to houses less than 1M in Ballard, when the computer receives the second entry from the user “townhomes” (houses in the city). Receiving entries (for example, commands / queries) from a user is discussed in more detail above with reference to Figure 1.
[0208] Based on the displayed data visualization, the computer extracts (1522) a second set of one or more independent analytical phrases from the second natural language command. For example, referring to Figure 6A, the second natural language command (620) received by the computer reads “townhomes” (houses in the city). In some implementations, for this example, the computer extracts “townhomes” (houses in the city) from the second natural language command, since this analytical phrase relates to the displayed data visualization (which concerns houses in the city in Ballard). When the phrases have direct reference to the data fields in the displayed data view, the extraction (1522) is direct: it collects all the phrases that are direct references to the data fields. In some implementations, the computer separates the stem or removes “stopwords (irrelevant words), padding words, or any set of words from the received query, and extracts (1522) all other phrases from the second natural language command, since may be related to the displayed data view. Some implementations use this approach when the phrases in the natural language command have some indirect reference to the data fields in the view
Petition 870190054547, of 06/13/2019, p. 115/227
103/108 displayed.
[0209] The language processing module (1524) calculates a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases, according to some implementations.
[0210] The language processing module (1526) derives a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules, according to some implementations.
[0211] The Computer updates (1528) the data visualization based on the second set of one or more functional phrases, according to some implementations.
[0212] Referring to Figure 15C, in some implementations, the language processing module 238 determines (1530) one or more data attributes corresponding to the second set of one or more conversation centers. The language processing module 238 then scans (1532) displayed data views to identify one or more of the displayed data views that contain data tags whose characteristics correspond to a first data attribute in the one or more data attributes, according to with some implementations. In some of such implementations (1534), the visualization features include one or more of color, size and shape. In some of these implementations (1536), the visualization characteristics correspond to a visual coding of data marks.
[0213] Subsequently, the computer highlights (1538) the data marks whose characteristics correspond to the first data attribute, according to some implementations. In any of such implementations, the computer filters
Petition 870190054547, of 06/13/2019, p. 116/227
104/108 (1540) the results of displayed data visualizations that contain data tags whose characteristics do not match one or more data attributes. In addition, in some of such implementations, the computer receives (1542) input from the user to determine whether to filter or enhance the data marks, and filters or enhances the data marks in the displayed data views based on the determination. Figure 11A described above showed an example of applying the principles of pragmatics to manage responses and feedback according to some implementations. The descriptions for Figure 11A apply to the steps illustrated in Figure 15C. For example, step 1102 in Figure 11A to create a list of all data attributes corresponds to step 1530 to determine one or more data attributes. Similarly, step 1106 to decide which of the existing view encodes a respective attribute corresponds to step 1532 to scan the displayed data views.
[0214] Referring now to Figure 15D, in some implementations, the computer determines (1544) whether none of the displayed data views contains data marks whose characteristics correspond to the first data attribute. In some implementations, according to the determination that none of the displayed data views contain data marks whose characteristics correspond to the first data attribute (1546): the computer generates (1548) a specification for a new data view with the first data attribute, and displays (1550) the new data view. In some implementations display (1550) the new data view includes: determining (1552) a type of chart based on the specification, and generating and displaying (1554) the chart. In some of these implementations (1556), the graph is positioned using a layout algorithm based on a two-dimensional grid, automatically coordinated with other data visualizations.
[0215] Referring to Figure 15E, in some implementations, the module of
Petition 870190054547, of 06/13/2019, p. 117/227
105/108 language processing 238 calculates (1558) a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases. In some implementations, the computer selects (1560) a first functional phrase from the second set of one or more functional phrases, in which the first functional phrase comprises a parameterized data selection criterion. In some implementations, the computer selects (1562) an initial interval for parameter values of the parameterized data selection criteria. In some implementations, the computer displays (1564) an editable user interface control corresponding to the parameterized data selection criteria, where the user interface control displays the current values of the parameters. In some of these implementations (1566), the user interface control allows the adjustment of the first functional phrase. Additionally, in some of such implementations (1568), the user interface control displays a slider, which allows a user to adjust the first functional phrase. In some implementations, the computer orders (1570) a displayed set of one or more editable user interface controls based on the order of queries in the second command in natural language, in which the order of queries is inferred while extracting the second set of one or more analytical phrases from the second natural language command. In some of such implementations, the computer uses (1572) a library that facilitates the compact placement of small word scale visualization within the text. In some of these implementations (1574), the library is Sparklificator®. Figure 12A described above shows an example of an interface illustration that includes a selectable set of widgets presented to the user, according to some implementations.
[0216] Referring to Figure 15F, in some implementations, the
Petition 870190054547, of 06/13/2019, p. 118/227
106/108 computer determines (1576) a first symbol in the second natural language command that does not match any of the analytical phrases in the second set of one or more analytical phrases. In some implementations, the computer searches (1578) for a correctly spelled term corresponding to the first symbol using a search library by comparing the first symbol with one or more aspects of the first data set. In some of such implementations (1580), the one or more aspects include data attributes, cell values and related keywords from the first data set. In some of such implementations (1582), the search library is Fuse.js®.
[0217] In some implementations, the language processing module 238 replaces (1584) the correctly misspelled term for the first symbol in the second natural language command to obtain a third natural language command, and extracts (1586) the second set of one or more analytical phrases from the third natural language command.
[0218] In some implementations, as illustrated in Figure 15G, the language processing module 238 determines (1588) whether there is no correctly spelled term corresponding to the first symbol. According to a determination that there is no correctly spelled term corresponding to the first symbol (1590), the language processing module analyzes (1592) the second natural language command to obtain an analysis tree, pruning (1594) the tree analysis to remove the part of the tree corresponding to the first symbol, and extract (1596) the analysis tree to remove the part of the tree corresponding to the first symbol. In some of such implementations, the language processing module 238 replaces (1598) the term correctly spelled for the first symbol in the second natural language command to obtain a third natural language command, and the
Petition 870190054547, of 06/13/2019, p. 119/227
107/108 computer displays (15,100) the first symbol.
[0219] In some implementations, as illustrated in Figure 15H, the computer generates (15.102) a textual feedback indicating that the term correctly spelled is used as a substitute for the first symbol in the second natural language command. In addition, in some of these implementations, the computer displays (15,104) and highlights the spelled term correctly. Figure 12B described above shows different examples of situations and the corresponding feedback generated by the computer, according to some implementations.
[0220] The terminology used in describing this invention serves only to describe specific implementations and is not intended to limit the present invention. As used in describing the invention and the appended claims, the singular forms one, one, o and a are also intended to include plural forms, unless otherwise specified by the context. It will also be understood that the term "and / or", as used here, refers to and encompasses any and all possible combinations of one or more of the associated listed items. It should also be understood that the terms comprise and / or comprise, when used in this specification, specify the presence of aspects, steps, operations, elements and / or components mentioned, but do not exclude the presence or addition of one or more aspects , stages, operations, elements, components and / or groups different from them.
[0221] The preceding description, for the purpose of explanation, has been described with reference to specific implementations. However, the above illustrative discussions are not intended to be exhaustive or to limit the invention to the exact forms disclosed. Many modifications and variations are possible in light of the above teachings. The implementations were chosen and described in order to better explain the principles of the invention and their practical applications, thus allowing other individuals skilled in the art to make the best use of
Petition 870190054547, of 06/13/2019, p. 120/227
108/108 invention and several implementations with several modifications insofar as they are appropriate to the specific use contemplated.

权利要求:
Claims (71)
[1]
1. Method to use natural language for visual analysis of a data set, CHARACTERIZED for understanding:
on a computer provided with a display medium, one or more processors, and a memory storing one or more programs configured to run by one or more processors:
display a data visualization based on a first set of data retrieved from a database using a first set of one or more queries;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
calculate a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first set of one or more functional phrases; and update the data visualization based on the first set of one or more functional phrases.
[2]
2. Method according to claim 1, CHARACTERIZED by additionally comprising:
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
Petition 870190054547, of 06/13/2019, p. 122/227
2/32 calculating a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases;
derive a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules;
calculate a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases; and update the data visualization based on the second set of one or more functional phrases.
[3]
3. Method, according to claim 2, CHARACTERIZED by the fact that each of the conversation centers of the first set of one or more conversation centers, the temporary set of one or more conversation centers, and the second set of a or more conversation centers comprise a value for a variable that specifies either a data attribute or a data visualization property, and where using one or more transition rules comprises:
determine whether a first variable is included in the first set of one or more conversation centers;
determine whether the first variable is included in the temporary set of one or more conversation centers;
determine a respective transition rule of the one or more transition rules to be applied based on whether the first variable is included in the first set of one or more conversation centers and / or in the temporary set of one or more conversation centers; and
Petition 870190054547, of 06/13/2019, p. 123/227
3/32 apply the respective transition rule.
[4]
4. Method, according to claim 3, CHARACTERIZED by the fact that the one or more transition rules comprise a CONTINUE rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more chat centers, and add one or more chat centers from the temporary set of one or more chat centers to the second set of one or more chat centers.
[5]
5. Method, according to claim 4, CHARACTERIZED by the fact that applying the respective transition rule comprises:
according to a determination that (i) the first variable is included in the temporary set of one or more conversation centers, and (ii) that the first variable is not included in the first set of one or more conversation centers, apply the CONTINUE rule to include the first variable in the second set of one or more conversation centers.
[6]
6. Method, according to claim 3, CHARACTERIZED by the fact that one or more transition rules comprise a RETER rule to retain each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers without adding any conversation centers from the temporary set of one or more conversation centers to the second set of one or more conversation centers.
[7]
7. Method, according to claim 6, CHARACTERIZED by the fact that applying the respective transition rule comprises:
according to a determination that (i) the first variable is included in the first set of one or more chat centers, and (ii) that the first variable is not included in the temporary set of one or more chat centers
Petition 870190054547, of 06/13/2019, p. 124/227
4/32 conversation, apply the RETER rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers.
[8]
8. Method, according to claim 3, CHARACTERIZED by the fact that one or more transition rules comprise a SHIFT rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more chat centers, and replace one or more chat centers in the second set of one or more chat centers with chat centers in the temporary set of one or more chat centers.
[9]
9. Method, according to claim 8, CHARACTERIZED by the fact that applying the respective transition rule comprises:
according to a determination that (i) the first variable is included in the first set of one or more conversation centers, and (ii) that the first variable is included in the temporary set of one or more conversation centers:
determining whether a first value of the first variable in the first set of one or more conversation centers is different from a second value of the first variable in the temporary set of one or more conversation centers;
according to a determination that the first value is different from the second value, apply the OFFSET rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and replace the value for the first variable in the second set of one or more conversation centers for the second value.
[10]
10. The method of claim 8, further characterized by comprising:
determine whether a widget corresponding to the first variable has been removed
Petition 870190054547, of 06/13/2019, p. 125/227
5/32 by the user; and according to a determination that the widget has been removed, apply the OFFSET rule to include each conversation center in the first set of one or more conversation centers in the second set of one or more conversation centers, and substitute the value for first variable in the second set of one or more conversation centers for a new value that includes the first value.
[11]
11. Method according to claim 1, CHARACTERIZED by additionally comprising:
determine whether the user selected a data set other than the first data set;
determine whether the user has restarted data visualization; and according to a determination that (I) the user selected a different data set, or (li) user restarted the data visualization, restarting each of the first set of one or more conversation centers, of the temporary set of a or more chat centers, and from the second set of one or more chat centers to an empty set that does not include any chat centers.
[12]
12. Method, according to claim 1, CHARACTERIZED by the fact that updating the data visualization based on the first set of one or more functional phrases comprises:
repeat the query to the database using a second set of one or more queries based on the first set of one or more functional phrases, thereby retrieving a second set of data; and update the data view based on the second data set.
[13]
13. Method according to claim 12, CHARACTERIZED by additionally understanding creating and displaying a new data visualization using the second data set.
Petition 870190054547, of 06/13/2019, p. 126/227
6/32
[14]
14. Method, according to claim 2, CHARACTERIZED by the fact that updating the data visualization based on the second set of one or more functional phrases comprises:
repeat the query to the database using a third set of one or more queries based on the second set of one or more functional phrases, thereby retrieving a third set of data; and update the data view based on the third data set.
[15]
15. Method, according to claim 14, CHARACTERIZED by additionally understanding creating and displaying a new data visualization using the third data set.
[16]
16. Electronic device, FEATURED for understanding:
a display medium;
one or more processors;
memory; and one or more programs, in which the one or more programs are stored in memory and configured to run by one or more processors, the one or more programs including instructions for:
display a data visualization based on a first set of data retrieved from a database using a first set of one or more queries;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
Petition 870190054547, of 06/13/2019, p. 127/227
7/32 calculating a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first set of one or more functional phrases; and update the data visualization based on the first set of one or more functional phrases.
[17]
17. Electronic device, according to claim 15, CHARACTERIZED by the fact that the one or more programs additionally comprise instructions for:
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
calculate a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases;
derive a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules;
calculate a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases; and update the data visualization based on the second set of one or more functional phrases.
[18]
18. Computer-readable non-temporary storage medium storing one or more programs configured to run by an electronic device with a display medium, CHARACTERIZED by the fact that
Petition 870190054547, of 06/13/2019, p. 128/227
8/32 the one or more programs comprise instructions for:
display a data visualization based on a first set of data retrieved from a database using a first set of one or more queries;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
calculate a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first set of one or more functional phrases; and update the data visualization based on the first set of one or more functional phrases.
[19]
19. Computer readable non-temporary storage medium according to claim 18, CHARACTERIZED by the fact that the one or more programs additionally comprise instructions for:
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
calculate a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases;
derive a second set of one or more conversation centers from
Petition 870190054547, of 06/13/2019, p. 129/227
9/32 of the first set of one or more chat centers and the temporary set of one or more chat centers using one or more transition rules;
calculate a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases; and update the data visualization based on the second set of one or more functional phrases.
[20]
20. Method to use natural language for visual analysis of a data set, FEATURED for understanding:
on a computer provided with a display medium, one or more processors, and a memory storing one or more programs configured to run by one or more processors:
display a data visualization based on a first set of data retrieved from a database using a first set of one or more queries;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
calculate a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first set of one or more functional phrases;
update the data view based on the first set of one or
Petition 870190054547, of 06/13/2019, p. 130/227
10/32 more functional phrases;
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
calculate a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases;
calculate the cohesion between the first set of one or more analytical phrases and the second set of one or more analytical phrases, and derive a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set one or more centers of conversation based on cohesion;
calculate a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases; and update the data visualization based on the second set of one or more functional phrases.
[21]
21. Method, according to claim 20, CHARACTERIZED by the fact that calculating cohesion and deriving the second set of one or more conversation centers based on cohesion comprises:
identify a sentence structure from the second set of one or more analytical sentences;
identify one or more forms of pragmatics based on the sentence structure; and derive a second set of one or more conversation centers from the first set of one or more conversation centers and the set
Petition 870190054547, of 06/13/2019, p. 131/227
11/32 temporary of one or more conversation centers based on one or more forms of pragmatics.
[22]
22. Method, according to claim 21, CHARACTERIZED by the fact that identifying the sentence structure comprises:
analyze the second command of natural language applying a probabilistic grammar, thus obtaining an analyzed output; and solve the analyzed output in corresponding data and categorical attributes.
[23]
23. Method, according to claim 22, CHARACTERIZED by the fact that analyzing the second natural language command additionally comprises deducting the syntactic structure using a morphosyntactic analysis API provided by a library of tools for natural language processing.
[24]
24. Method, according to claim 21, CHARACTERIZED by the fact that:
identifying one or more forms of pragmatics involves determining whether the second natural language command is an incomplete statement by determining whether one or more linguistic elements are missing from the sentence structure; and deriving the second set of one or more conversation centers comprises:
according to the determination that the second natural language command is an incomplete statement:
determining a first subset of conversation centers in the first set of one or more conversation centers, the first subset of conversation centers corresponding to one or more linguistic elements missing from the sentence structure; and
Petition 870190054547, of 06/13/2019, p. 132/227
12/32 calculate the second set of one or more chat centers by combining the temporary set of one or more chat centers with the first subset of chat centers.
[25]
25. Method, according to claim 21, CHARACTERIZED by the fact that:
identifying one or more forms of pragmatics involves determining whether the second natural language command is a reference expression by determining whether one or more anaphoric references are present in the sentence structure; and deriving the second set of one or more conversation centers comprises:
according to the determination that the second natural command is a reference expression:
search the first set of one or more conversation centers to find a first subset of conversation centers that corresponds to a phrasal block in the second natural language command that contains a first anaphoric reference from one or more anaphoric references; and calculating the second set of one or more chat centers based on the temporary set of one or more chat centers and the first subset of chat centers.
[26]
26. The method of claim 25, further characterized by comprising:
determine whether the first anaphoric reference is accompanied by a verb in the second natural language command;
according to a determination that the anaphoric reference is accompanied by a verb:
search the first set of one or more conversation centers for
Petition 870190054547, of 06/13/2019, p. 133/227
13/32 find a first action conversation center that refers to an action verb, and calculates the second set of one or more conversation centers based on the temporary set of one or more conversation centers, in the first subset of centers of conversation, and in the first action conversation center.
[27]
27. The method of claim 25, further characterized by comprising:
determine whether the first anaphoric reference is a deitic reference that refers to an object in the environment;
according to a determination that the anaphoric reference is a deitic reference, calculate the second set of one or more conversation centers based on the temporary set of one or more conversation centers, and a characteristic of the object.
[28]
28. The method of claim 25, further comprising:
determine whether the first anaphoric reference is a reference to a visualization property in the updated data visualization;
according to a determination that the anaphoric reference is a deictic reference, calculate the second set of one or more conversation centers based on the temporary set of one or more conversation centers, and data related to the visualization property.
[29]
29. Method, according to claim 21, CHARACTERIZED by the fact that:
identifying one or more forms of pragmatics involves determining whether the second natural language command is a repair enunciation by determining whether the sentence structure corresponds to one or more enunciations of
Petition 870190054547, of 06/13/2019, p. 134/227
14/32 predefined repair; and deriving the second set of one or more conversation centers comprises:
according to the determination that the second natural language command is a repair statement:
calculate the second set of one or more chat centers based on the temporary set of one or more chat centers; and updating one or more data attributes in the second set of one or more conversation centers based on one or more predefined repair statements and phrase structure.
[30]
30. The method of claim 29, further comprising:
determine whether the sentence structure corresponds to a repair statement to change a standard behavior related to the display of a data visualization; and according to a determination that the sentence structure corresponds to a repair statement to modify a standard behavior, modifying the standard behavior related to the display.
[31]
31. Method, according to claim 21, CHARACTERIZED by the fact that:
identifying one or more forms of pragmatics involves determining whether the second natural language command is a conjunctive expression (i) determining the implicit or explicit presence of conjunctions in the sentence structure, and (ii) determining whether the temporary set of one or more chat centers includes each chat center in the first set of one or more chat centers; and derive the second set from one or more conversation centers
Petition 870190054547, of 06/13/2019, p. 135/227
15/32 comprises:
according to the determination that the second natural language command is a conjunctive expression, calculate the second set of one or more conversation centers based on the temporary set of one or more conversation centers.
[32]
32. The method of claim 31, further comprising:
determine if the second natural language command has more than one set; and according to the determination that the second natural language command has more than one set, calculate the second set of one or more analytical functions by linearizing the second natural language command.
[33]
33. Method, according to claim 32, CHARACTERIZED by the fact that linearizing the second language command comprises:
generate an analysis tree for the second natural language command;
go through the post-order analysis tree to extract a first analytical phrase and a second analytical phrase, where the first analytical phrase and the second analytical phrase are adjacent nodes in the analysis tree;
calculate a first analytical function and a second analytical function corresponding to the first analytical phrase and the second analytical phrase, respectively; and combine the first analytical function with the second analytical function by applying one or more logical operators based on one or more characteristics of the first analytical function and the second analytical function, where the one or more characteristics include type of attribute, type of operator and a value.
[34]
34. Method, according to claim 33, CHARACTERIZED by the fact
Petition 870190054547, of 06/13/2019, p. 136/227
16/32 that:
the first analytical function comprises a first attribute, a first operator and a first value;
the second analytical function comprises a second attribute, a second operator and a second value; and combining the first analytical function with the second analytical function comprises:
determine whether the first attribute is a categorical type attribute or an ordered type attribute, and determine whether the second attribute is a categorical type attribute or an ordered type attribute;
determine whether the first attribute and the second attribute are identical; and according to a determination that the first attribute and the second attribute are identical and both are categorical attributes, apply a join operator to combine the first analytic function and the second analytic function.
[35]
35. Method, according to claim 33, CHARACTERIZED by the fact that:
the first analytical function comprises a first attribute, a first operator and a first value;
the second analytical function comprises a second attribute, a second operator and a second value; and combining the first analytical function with the second analytical function comprises:
determine whether the first attribute is a categorical type attribute or an ordered type attribute, and determine whether the second attribute is a categorical type attribute or an ordered type attribute;
determine whether the first attribute and the second attribute are identical; and according to a determination that the first attribute and the second
Petition 870190054547, of 06/13/2019, p. 137/227
17/32 attributes are non-identical, apply the intersection operator to combine the first analytic function and the second analytic function.
[36]
36. Method, according to claim 33, CHARACTERIZED by the fact that:
the first analytical function comprises a first attribute, a first operator and a first value;
the second analytical function comprises a second attribute, a second operator and a second value; and combining the first analytical function with the second analytical function comprises:
determine whether the first attribute is a categorical type attribute or an ordered type attribute, and determine whether the second attribute is a categorical type attribute or an ordered type attribute;
determine whether the first attribute and the second attribute are identical; and according to a determination that the first attribute and the second attribute are identical, and both are attributes of the ordered type:
determine the types of operators of the first operator and the second operator; and according to a determination that both the first operator and the second operator are equality operators, apply the union operator to combine the first analytic function and the second analytic function.
[37]
37. Method, according to claim 33, CHARACTERIZED by the fact that:
the first analytical function comprises a first attribute, a first operator and a first value;
the second analytical function comprises a second attribute, a second operator and a second value; and
Petition 870190054547, of 06/13/2019, p. 138/227
18/32 combining the first analytical function with the second analytical function comprises:
determine whether the first attribute is a categorical type attribute or an ordered type attribute, and determine whether the second attribute is a categorical type attribute or an ordered type attribute;
determine whether the first attribute and the second attribute are identical; and according to a determination that the first attribute and the second attribute are identical, and both are attributes of the ordered type:
determine the types of operators of the first operator and the second operator; and according to a determination that the first operator is a “less than” operator and the second operator is a “greater than” operator:
determine whether the first value is less than the second value; and according to a determination that the first value is less than the second value, apply the join operator to combine the first analytical function and the second analytical function.
[38]
38. Method, according to claim 33, CHARACTERIZED by the fact that:
the first analytical function comprises a first attribute, a first operator and a first value;
the second analytical function comprises a second attribute, a second operator and a second value; and combining the first analytical function with the second analytical function comprises:
determine whether the first attribute is a categorical type attribute or an ordered type attribute, and determine whether the second attribute is a categorical type attribute or an ordered type attribute;
Petition 870190054547, of 06/13/2019, p. 139/227
19/32 determine whether the first attribute and the second attribute are identical; and according to a determination that the first attribute and the second attribute are identical, and both are attributes of the ordered type:
determine the types of operators of the first operator and the second operator; and according to a determination that the first operator is a “greater than” operator and the second operator is a “less than” operator:
determine whether the first value is less than the second value; and according to a determination that the first value is less than the second value, apply the intersection operator to combine the first analytical function and the second analytical function.
[39]
39. The method of claim 20, further characterized by comprising:
calculate the semantic kinship between the second set of one or more extracted analytical phrases and one or more data attributes included in the updated data view, and calculate analytical functions associated with the second set of one or more analytical phrases, thereby creating the second set of one or more functional phrases, based on the semantic kinship of one or more data attributes.
[40]
40. Method, according to claim 39, CHARACTERIZED by the fact that calculating semantic kinship comprises:
train a first model of neural network in a large text corpus, thus learning vector representations of words;
calculating a first word vector for a first word in a first sentence in the second set of one or more analytical sentences using a second neural network model, the first word vector mapping mapping the first word to vector representations of words;
Petition 870190054547, of 06/13/2019, p. 140/227
20/32 calculating a second word vector for a first data attribute on one or more data attributes using the second neural network model, the second word vector mapping the first data attribute to the vector representations of words; and calculate the relationship between the first word vector and the second word vector using a similarity metric.
[41]
41. Method, according to claim 40, CHARACTERIZED by the fact that the first neural network model comprises the Word2vec® model.
[42]
42. Method, according to claim 40, CHARACTERIZED by the fact that the second model of neural network comprises the model of recurrent neural network.
[43]
43. Method, according to claim 40, CHARACTERIZED by the fact that the similarity metric is based at least (i) on the distance of WuPalmer between the word meanings associated with the first word vector and the second word vector, ( ii) a weighting factor, and (iii) a cosine distance in pairs between the first word vector and the second word vector.
[44]
44. Method, according to claim 39, CHARACTERIZED by the fact that calculating the analytical functions comprises:
obtain word definitions for the second set of one or more analytical sentences from a publicly available dictionary;
determine whether word definitions contain one or more predefined adjectives using a morphosyntactic analysis API provided by a library of tools for natural language processing; and according to the determination that the word definitions contain one or more predefined adjectives, map the one or more predefined adjectives to one or more analytical functions.
[45]
45. Electronic device, CHARACTERIZED for understanding:
Petition 870190054547, of 06/13/2019, p. 141/227
21/32 a display medium;
one or more processors;
memory; and one or more programs, in which the one or more programs are stored in memory and configured to run by one or more processors, the one or more programs including instructions for:
display a data visualization based on a first set of data retrieved from a database using a first set of one or more queries;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
calculate a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first set of one or more functional phrases;
update the data visualization based on the first set of one or more functional phrases;
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
calculate a temporary set of one or more conversation centers associated with the second natural language command based on the second
Petition 870190054547, of 06/13/2019, p. 142/227
22/32 set of one or more analytical phrases;
calculate the cohesion between the first set of one or more analytical phrases and the second set of one or more analytical phrases;
derive a second set of one or more chat centers from the first set of one or more chat centers and the temporary set of one or more chat centers based on cohesion;
calculate a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases; and update the data visualization based on the second set of one or more functional phrases.
[46]
46. Computer-readable non-temporary storage medium storing one or more programs configured for execution by an electronic device with a display medium, CHARACTERIZED by the fact that the one or more programs comprise instructions for:
display a data visualization based on a first set of data retrieved from a database using a first set of one or more queries;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
calculate a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first
Petition 870190054547, of 06/13/2019, p. 143/227
23/32 set of one or more functional phrases;
update the data visualization based on the first set of one or more functional phrases;
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
calculate a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases;
calculate the cohesion between the first set of one or more analytical phrases and the second set of one or more analytical phrases;
derive a second set of one or more chat centers from the first set of one or more chat centers and the temporary set of one or more chat centers based on cohesion;
calculate a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases; and update the data visualization based on the second set of one or more functional phrases.
[47]
47. Method to use natural language for visual analysis of a data set, FEATURED for understanding:
on a computer provided with a display medium, one or more processors, and a memory storing one or more programs configured to run by one or more processors:
display a data visualization based on a first set of data retrieved from a database using a first set of
Petition 870190054547, of 06/13/2019, p. 144/227
24/32 one or more consultations;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
calculate a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first set of one or more functional phrases;
update the data visualization based on the first set of one or more functional phrases;
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
calculate a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases;
derive a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules; and update the data view based on the second set of one or more conversation centers.
[48]
48. Method, according to claim 47, CHARACTERIZED by the fact
Petition 870190054547, of 06/13/2019, p. 145/227
25/32 that updating the data visualization based on the second set of one or more conversation centers comprises:
determine one or more data attributes corresponding to the second set of one or more conversation centers;
scan displayed data views to identify one or more of the displayed data views that contain data tags whose characteristics correspond to a first data attribute in one or more data attributes; and highlight the data tags whose characteristics correspond to the first data attribute.
[49]
49. Method, according to claim 48, CHARACTERIZED in that it additionally comprises filtering results of displayed data visualizations that contain data tags whose characteristics do not correspond to one or more data attributes.
[50]
50. Method, according to claim 49, CHARACTERIZED in that it further comprises receiving input from the user to determine whether to filter or enhance the data marks, and to filter or enhance the data marks in the displayed data views based on the determination.
[51]
51. Method, according to claim 48, CHARACTERIZED by the fact that the visualization characteristics include one or more among color, size and shape.
[52]
52. Method, according to claim 48, CHARACTERIZED by the fact that the visualization characteristics correspond to a visual coding of data marks.
[53]
53. Method, according to claim 52, CHARACTERIZED by the fact that the visual coding is one or more among color, size and format.
[54]
54. The method of claim 48, further comprising:
Petition 870190054547, of 06/13/2019, p. 146/227
26/32 determine whether none of the displayed data views contains data tags whose characteristics match the first data attribute; and according to the determination that none of the displayed data views contain data tags whose characteristics match the first data attribute:
generate a specification for a new data view with the first data attribute; and display the new data view.
[55]
55. Method, according to claim 54, CHARACTERIZED by the fact that displaying the new data visualization additionally comprises:
determine a type of chart based on the specification; and generate and display the graph.
[56]
56. Method, according to claim 55, CHARACTERIZED by the fact that the graph is positioned using a layout algorithm based on a two-dimensional grid, automatically coordinated with other data visualizations.
[57]
57. The method of claim 47, further comprising:
calculate a second set of one or more analytical functions associated with the second set of one or more conversation centers, thereby creating a second set of one or more functional phrases;
select a first functional phrase from the second set of one or more functional phrases, in which the first functional phrase comprises a parameterized data selection criterion;
select an initial range for parameter values of the parameterized data selection criteria;
display an editable user interface control matching the criteria
Petition 870190054547, of 06/13/2019, p. 147/227
27/32 parameterized data selection, where the user interface control displays the current values of the parameters; and sorting a displayed set of one or more editable user interface controls based on the order of the queries in the second natural language command, in which the order of the queries is inferred while extracting the second set of one or more analytical phrases from of the second natural language command.
[58]
58. Method, according to claim 57, CHARACTERIZED by the fact that the user interface control allows the adjustment of the first functional phrase.
[59]
59. Method, according to claim 58, CHARACTERIZED by the fact that the user interface control displays a slider, which allows a user to adjust the first functional phrase.
[60]
60. Method according to claim 57, CHARACTERIZED by the fact that ordering the displayed set of one or more editable user interface controls additionally comprises using a library that facilitates the compact placement of the visualization in small word scale within the text .
[61]
61. Method, according to claim 60, CHARACTERIZED by the fact that the library is Sparklificator®.
[62]
62. The method of claim 47, further comprising:
determining a first symbol in the second natural language command that does not match any of the analytical phrases in the second set of one or more analytical phrases;
search for a correctly spelled term corresponding to the first symbol using a search library by comparing the first symbol with one or more aspects of the first data set;
Petition 870190054547, of 06/13/2019, p. 148/227
28/32 replace the term correctly spelled for the first symbol in the second natural language command to obtain a third natural language command; and extract the second set of one or more analytical phrases from the third natural language command.
[63]
63. Method according to claim 62, CHARACTERIZED by the fact that the one or more aspects include data attributes, cell values and related keywords from the first data set.
[64]
64. Method, according to claim 62, CHARACTERIZED by the fact that the research library is Fuse.js®.
[65]
65. The method of claim 62, further comprising:
determine if there is no correctly spelled term corresponding to the first symbol; and according to a determination that there is no term correctly spelled corresponding to the first symbol:
analyze the second natural language command to obtain an analysis tree;
prune the analysis tree to remove the portion of the tree corresponding to the first symbol; and extract the second set of one or more analytical phrases based on the pruned analysis tree.
[66]
66. Method, according to claim 65, CHARACTERIZED by additionally comprising generating textual feedback indicating that the first symbol has not been recognized, and therefore removed from the second natural language command.
[67]
67. Method according to claim 62, CHARACTERIZED by
Petition 870190054547, of 06/13/2019, p. 149/227
29/32 additionally understand to display the first symbol.
[68]
68. Method, according to claim 62, CHARACTERIZED by additionally comprising generating a textual feedback indicating that the term correctly spelled is used as a substitute for the first symbol in the second natural language command.
[69]
69. Method, according to claim 68, CHARACTERIZED in that it additionally comprises displaying and highlighting the correctly misspelled term.
[70]
70. Electronic device, CHARACTERIZED for understanding:
a display medium;
one or more processors;
memory; and one or more programs, in which the one or more programs are stored in memory and configured to run by one or more processors, the one or more programs including instructions for:
display a data visualization based on a first set of data retrieved from a database using a first set of one or more queries;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
calculate a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first set of one or more functional phrases;
Petition 870190054547, of 06/13/2019, p. 150/227
30/32 update the data visualization based on the first set of one or more functional phrases;
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
calculate a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases;
derive a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules; and update the data visualization based on the second set of one or more conversation centers, where the update comprises:
determine one or more data attributes corresponding to the second set of one or more conversation centers;
scan displayed data views to identify one or more of the displayed data views that contain data tags whose characteristics correspond to a first data attribute in one or more data attributes; and highlight the data tags whose characteristics correspond to the first data attribute.
[71]
71. Computer-readable non-temporary storage medium storing one or more programs configured for execution by an electronic device with a display medium, CHARACTERIZED by the fact that the one or more programs comprise instructions for:
display a data visualization based on a first set of
Petition 870190054547, of 06/13/2019, p. 151/227
31/32 data retrieved from a database using a first set of one or more queries;
receive a first user input to specify a first natural language command related to data visualization;
extract a first set of one or more independent analytical phrases from the first natural language command;
calculate a first set of one or more conversation centers associated with the first natural language command based on the first set of one or more analytical phrases;
calculate a first set of analytical functions associated with the first set of one or more conversation centers, thus creating a first set of one or more functional phrases;
update the data visualization based on the first set of one or more functional phrases;
receive a second user input to specify a second natural language command related to the updated data view;
extract a second set of one or more independent analytical phrases from the second natural language command;
calculate a temporary set of one or more conversation centers associated with the second natural language command based on the second set of one or more analytical phrases;
derive a second set of one or more conversation centers from the first set of one or more conversation centers and the temporary set of one or more conversation centers using one or more transition rules; and update the data visualization based on the second set of one or more conversation centers, where the update comprises:
Petition 870190054547, of 06/13/2019, p. 152/227
32/32 determining one or more data attributes corresponding to the second set of one or more conversation centers;
scan displayed data views to identify one or more of the displayed data views that contain data tags whose characteristics correspond to a first data attribute in one or more data attributes; and highlight the data tags whose characteristics correspond to the first data attribute.

类似技术:

公开号 | 公开日 | 专利标题

BR112019012110A2|2019-10-29|systems and methods of applying pragmatic principles for interaction with visual analytics

US10956464B2|2021-03-23|Natural language question answering method and apparatus

US11055489B2|2021-07-06|Determining levels of detail for data visualizations using natural language constructs

US11244006B1|2022-02-08|Using natural language processing for visual analysis of a data set

US10853357B2|2020-12-01|Extensible automatic query language generator for semantic data

US11010396B1|2021-05-18|Data visualization user interface using cohesion of sequential natural language commands

US20130124194A1|2013-05-16|Systems and methods for manipulating data using natural language commands

US20210303558A1|2021-09-30|Applying Natural Language Pragmatics in a Data Visualization User Interface

CN109408811B|2021-10-22|Data processing method and server

CN111191047A|2020-05-22|Knowledge graph construction method for human-computer cooperation disassembly task

US10795902B1|2020-10-06|Applying natural language pragmatics in a data visualization user interface

US10713429B2|2020-07-14|Joining web data with spreadsheet data using examples

KR20150084706A|2015-07-22|Apparatus for knowledge learning of ontology and method thereof

Myint et al.2014|Triple patterns extraction for accessing data on ontology

Toepfer et al.2014|Integrated tools for query-driven development of light-weight ontologies and information extraction components

US20220004714A1|2022-01-06|Event extraction method and apparatus, and storage medium

US20210097136A1|2021-04-01|Generating corpus for training and validating machine learning model for natural language processing

Faridghasemnia et al.2019|Capturing frame-like object descriptors in human augmented mapping

Atzori et al.2018|Querying RDF Data Cubes through Natural Language.

Liao et al.2021|An Automatic and Unified Consistency Verification Rule and Method of SG-CIM Model

CN114064847A|2022-02-18|Text detection method and device, electronic equipment and storage medium

同族专利:

公开号 | 公开日

CA3043923A1|2018-11-08|

AU2018261160B2|2020-08-27|

WO2018204696A1|2018-11-08|

AU2020230301B2|2021-04-22|

CN110121705A|2019-08-13|

AU2021204448A1|2021-07-29|

US10817527B1|2020-10-27|

JP2020520485A|2020-07-09|

AU2020230301A1|2020-10-01|

EP3535676A1|2019-09-11|

AU2018261160A1|2019-07-11|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US20040030741A1|2001-04-02|2004-02-12|Wolton Richard Ernest|Method and apparatus for search, visual navigation, analysis and retrieval of information from networks with remote notification and content delivery|

US7019749B2|2001-12-28|2006-03-28|Microsoft Corporation|Conversational interface agent|

US7606714B2|2003-02-11|2009-10-20|Microsoft Corporation|Natural language classification within an automated response system|

US7089266B2|2003-06-02|2006-08-08|The Board Of Trustees Of The Leland Stanford Jr. University|Computer systems and methods for the query and visualization of multidimensional databases|

US7455157B2|2004-07-27|2008-11-25|Ford Global Technologies, Llc|Ratcheting one-way clutch having rockers|

US7640162B2|2004-12-14|2009-12-29|Microsoft Corporation|Semantic canvas|

US20060218140A1|2005-02-09|2006-09-28|Battelle Memorial Institute|Method and apparatus for labeling in steered visual analysis of collections of documents|

US7818246B2|2005-04-05|2010-10-19|Barclays Capital Inc.|Systems and methods for order analysis, enrichment, and execution|

US20090299990A1|2008-05-30|2009-12-03|Vidya Setlur|Method, apparatus and computer program product for providing correlations between information from heterogenous sources|

US8359191B2|2008-08-01|2013-01-22|International Business Machines Corporation|Deriving ontology based on linguistics and community tag clouds|

US8212817B2|2008-10-31|2012-07-03|Hewlett-Packard Development Company, L.P.|Spatial temporal visual analysis of thermal data|

US8621387B2|2009-06-08|2013-12-31|Apple Inc.|User interface for multiple display regions|

US8489641B1|2010-07-08|2013-07-16|Google Inc.|Displaying layers of search results on a map|

WO2013012107A1|2011-07-19|2013-01-24|엘지전자 주식회사|Electronic device and method for controlling same|

US9400835B2|2011-07-28|2016-07-26|Nokia Technologies Oy|Weighting metric for visual search of entity-relationship databases|

US9508169B2|2012-09-14|2016-11-29|Google Inc.|Method and apparatus for contextually varying amounts of imagery on a map|

US10304465B2|2012-10-30|2019-05-28|Google Technology Holdings LLC|Voice control user interface for low power mode|

KR102211595B1|2012-12-07|2021-02-04|삼성전자주식회사|Speech recognition apparatus and control method thereof|

US20140192140A1|2013-01-07|2014-07-10|Microsoft Corporation|Visual Content Modification for Distributed Story Reading|

US9501585B1|2013-06-13|2016-11-22|DataRPM Corporation|Methods and system for providing real-time business intelligence using search-based analytics engine|

US9575720B2|2013-07-31|2017-02-21|Google Inc.|Visual confirmation for a recognized voice-initiated action|

US9342567B2|2013-08-23|2016-05-17|International Business Machines Corporation|Control for persistent search results and iterative searching|

US9477752B1|2013-09-30|2016-10-25|Verint Systems Inc.|Ontology administration and application to enhance communication data analytics|

US9858292B1|2013-11-11|2018-01-02|Tableau Software, Inc.|Systems and methods for semantic icon encoding in data visualizations|

US9639854B2|2014-06-26|2017-05-02|Nuance Communications, Inc.|Voice-controlled information exchange platform, such as for providing information to supplement advertising|

US20160261675A1|2014-08-02|2016-09-08|Apple Inc.|Sharing user-configurable graphical constructs|

US9904450B2|2014-12-19|2018-02-27|At&T Intellectual Property I, L.P.|System and method for creating and sharing plans through multimodal dialog|US10417247B2|2014-09-25|2019-09-17|Oracle International Corporation|Techniques for semantic searching|

US10664488B2|2014-09-25|2020-05-26|Oracle International Corporation|Semantic searches in a business intelligence system|

US10902045B2|2018-09-18|2021-01-26|Tableau Software, Inc.|Natural language interface for building data visualizations, including cascading edits to filter expressions|

US11048871B2|2018-09-18|2021-06-29|Tableau Software, Inc.|Analyzing natural language expressions in a data visualization user interface|

AU2019344461A1|2018-09-18|2021-03-25|Tableau Software, Inc.|Analyzing natural language expressions in a data visualization user interface|

US11055489B2|2018-10-08|2021-07-06|Tableau Software, Inc.|Determining levels of detail for data visualizations using natural language constructs|

US11263277B1|2018-11-01|2022-03-01|Intuit Inc.|Modifying computerized searches through the generation and use of semantic graph data models|

US11042558B1|2019-09-06|2021-06-22|Tableau Software, Inc.|Determining ranges for vague modifiers in natural language commands|

CN110598041A|2019-09-06|2019-12-20|广州努比互联网科技有限公司|FlACS real-time analysis method and device|

US20220029937A1|2020-07-27|2022-01-27|Tableau Software, LLC|Conversational Natural Language Interfaces for Data Analysis|

US11232120B1|2020-07-30|2022-01-25|Tableau Software, LLC|Schema viewer searching for a data analytics platform|

US11216450B1|2020-07-30|2022-01-04|Tableau Software, LLC|Analyzing data using data fields from multiple objects in an object model|

法律状态:
2021-10-13| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762500999P| true| 2017-05-03|2017-05-03|

US15/804,991|US10817527B1|2016-04-12|2017-11-06|Systems and methods of using natural language processing for visual analysis of a data set|

PCT/US2018/030959|WO2018204696A1|2017-05-03|2018-05-03|Systems and methods of applying pragmatics principles for interaction with visual analytics|

[返回顶部]